Linked Science as a producer and consumer of big data in the Earth Sciences

From this document you will learn the answers to the following questions:

What does BioPortal stand for?

What is the name of the combined metadata engine?

What does Linked Science do with data?

Similar documents
ARM Data Center Experience in Linking Big Data

Report to the NOAA Science Advisory Board

A Linked Science Investigation: Enhancing Climate Change Data Discovery with Semantic Technologies

Scientific Computing at NCEAS

AgroPortal. a proposition for ontologybased services in the agronomic domain

In 2014, the Research Data Purdue University

Software Description Technology

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

Open Ontology Repository Initiative

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Exploration and Visualization of Post-Market Data

Disributed Query Processing KGRAM - Search Engine TOP 10

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes

Change in Emphasis: Recasting Resource Investments and the Rise of Special Collections

NASA s Big Data Challenges in Climate Science

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

How To Write An Nccwsc/Csc Data Management Plan

Databases & Data Infrastructure. Kerstin Lehnert

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, May, 2011

Data Management Framework for the North American Carbon Program

ChemCloud - Chemical e-science Information Cloud. Adrian Paschke, Freie Universitaet Berlin Stephan Heineke, FIZ CHEMIE

PODD. An Ontology Driven Architecture for Extensible Phenomics Data Management

Biomedical Informatics Applications, Big Data, & Cloud Computing

Data Curation for the Long Tail of Science: The Case of Environmental Sciences

Exploitation of ISS scientific data

Digital libraries of the future and the role of libraries

ONTOLOGY-BASED GENERIC TEMPLATE FOR RETAIL ANALYTICS

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

CIESIN Columbia University

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

Project Knowledge Management Based on Social Networks

Report of the DTL focus meeting on Life Science Data Repositories

Considering the Way Forward for Data Science and International Climate Science

The Importance of Bioinformatics and Information Management

An Ontology-Based Information Management System for Pharmaceutical Product Development

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier

Building Semantic Content Management Framework

Data Publishing Workflows with Dataverse

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Case studies of ecological integrative information systems: The Luquillo and Sevilleta Information Management Systems

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining

A Capability Maturity Model for Scientific Data Management

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

How To Write A Blog Post On Globus

Building a virtual data center for the biological, ecological and environmental sciences

The EcoTrends Web Portal: An Architecture for Data Discovery and Exploration

"Data Manufacturing: A Test Data Management Solution"

Service Road Map for ANDS Core Infrastructure and Applications Programs

Fédération et analyse de données distribuées en imagerie biomédicale

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Digital Marketplace - G-Cloud

Ontology-Based Discovery of Workflow Activity Patterns

Geospatial Data Stewardship at an Interdisciplinary Data Center

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

Big Data Europe

Information Management. Corinna Gries

Data Management. Facility Access Challenges: Rudi Eigenmann NEES Operations Headquarters NEEScomm Center Purdue University

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

An Ontology-based Architecture for Integration of Clinical Trials Management Applications

The mission of academic libraries is to support research, education,

Model Project on the Ecosystem Data Management. Research Center for Coastal Lagoon Environments, Shimane University

The ebbits project: from the Internet of Things to Food Traceability

Organic Data Publishing: A Novel Approach to Scientific Data Sharing

IO Informatics The Sentient Suite

Data Lifecycle Management

EXCOM 2015 Tromsø, Norway August 2015 SCADM Report (Standing Committee on Antarctic Data Management)

Semantically-enabled (large-scale) Scientific Data Integration (SESDI)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Web Services and Development of Semantic Applications

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

Cyberinfrastructure Planning within the LTER Network Planning Grant Context. Barbara Benson James Brunt Peter McCartney John Porter John Vande Castle

Semantic Search in Portals using Ontologies

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Annotea and Semantic Web Supported Collaboration

Department of Energy Strategic Roadmap for Earth System Science Data Integration

HPC & Visualization. Visualization and High-Performance Computing

Enabling the Big Data Commons through indexing of data and their interactions

Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution

A Library Data Management Platform Based on Linked Open Data

NASA Earth System Science: Structure and data centers

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

Evaluating Semantic Web Service Tools using the SEALS platform

Deep Carbon Observatory Data Science Interim Report, Dec

Enabling Self Organising Logistics on the Web of Things

LabArchives Electronic Lab Notebook:

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

The National Consortium for Data Science (NCDS)

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Describing Web Services for user-oriented retrieval

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Second EUDAT Conference, October 2013 Workshop: Digital Preservation of Cultural Data Scalability in preservation of cultural heritage data

Finding a Good Ontology: The Open Ontology Repository Initiative

Data Management Considerations for the Data Life Cycle

Supporting Change-Aware Semantic Web Services

Modernizing Simulation Input Generation and Post-Simulation Data Visualization with Eclipse ICE

National eresearch Collaboration Tools and Resources nectar.org.au

SQLSaturday#393 Redmond 16 May, End-to-End SQL Server Master Data Services

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

Transcription:

Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center for Biomedical Informatics Research ** Presented to the Ontology Summit 2012 March 8, 2012

What is Linked Science? Scientific collaborations are growing and becoming more interdisciplinary, Linked Science is: A practice that accounts for objects of study from an end-to-end perspective exhibits key trends in the role of data means of communication between stakeholders are key characterized by experimental, theoretical and simulation methods systematic emphasis on validated datasets, value-added data products, traceable information and integrated processes First Linked Science Workshop, International Semantic Web Conference, October 2011 Second Linked Science Workshop, ISWC, October 2012, Boston 2

Linked Science in the Earth Sciences Coupled Climate-Land Surface Model FLUXNET Forest Biomass Forest Biomass Coupled Climate-Land Surface- Hydrology Model Discovery and access from heterogeneous sources Simulations, models, experiments, remote sensing, GIS, molecular and omics databases, publications Metadata and semantics integration Workflows, scenario development, data and process re-use, provenance Engaging communities of scientists, educators, librarians, developers, volunteers Relies upon cyber-infrastructure promoting open source Complex systems of systems, networks of projects, repositories, archives, publishers IPCC Report 3

Near-term future nodes Existing nodes Metadata Extraction 4 Mercury: federated metadata engine for earth sciences data NBII BDP LTER EML USA-NPN BDP ORNL DAAC FGDC OBFS IAI LP DAAC EML FGDC DIF Internal Metadata Index LBA I-LTER TERN SAEON FGDC Stored at Mercury Server

Coupling the Mercury federated metadata engine and BioPortal Query Controller Transaction Display Ontology Index Uses BioPortal Rest s for programmatic access Returns ontology concepts, super- and subclasses Provides additional keywords Provides context Uses these for new searches 5

BioPortal provides access to ontologies Domain ontologies are used within the engineered system to improve facetted search results 6

Ontology-based search results Concepts acquire context: biomass as Material or biomass as Energy Additional search terms Super-classes may have different properties 7

Metrics assessing the impact of using ontologies in search y = frequency x = count of index terms Matching the top 100 Mercury parameters to ontology terms Frequency count: 79% of the Top 100 keywords have at least one match in the chosen ontologies N = 99, 2 values missing (plant, leaf) water : 38 air, carbon = 28 8

Lessons Learned User-friendly display Current display may be confusing. What are the options? send the user to a new page implement a new display dynamically driven by ontology relationships Ontology content SWEET provides a good basis, but needs to be further specified for the needs of this Data Center Many ontologies provide only few relationships Implementation Adding ontology entities to a keyword index helps with recall but cannot substitute for semantic annotations of the metadata documents 9

Thank you ORNL DAAC and Mercury http://mercury.ornl.gov ORNL DAAC ontology service http://mercury.ornl.gov/ontologydemo ORNL DAAC instance of BioPortal http://mercury-ncbo.ornl.gov Stanford Center for Biomedical Informatics Research BioPortal http://bioportal.bioontology.org Stanford Center for Biomedical Informatics Research Protégé ontology editor http://protege.stanford.edu The submitted manuscript has been authored by the US Department of Energy, Office of Science of the Oak Ridge National Laboratory, managed for the U. S. DOE by UT-Battelle, LLC, under contract No. DE-AC05-00OR22725. 10