PODD. An Ontology Driven Architecture for Extensible Phenomics Data Management

From this document you will learn the answers to the following questions:

What is the study of plant phenmics?

What does an antology describe?

What is the abbreviation for Phenomic FTIR Microscopy Gas Exchange?

Similar documents

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

National eresearch Collaboration Tools and Resources nectar.org.au

Service Road Map for ANDS Core Infrastructure and Applications Programs

Report of the DTL focus meeting on Life Science Data Repositories

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

D5.3.2b Automatic Rigorous Testing Components

Towards the construction of an integrated Wheat Information System

Building Semantic Content Management Framework

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

AgroPortal. a proposition for ontologybased services in the agronomic domain

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

A Strategy for Plant Breeding Data Management in International Agricultural Research

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Dendro: collaborative research data management built on linked open data

The growing challenges of big data in the agricultural and ecological sciences

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Presente e futuro del Web Semantico

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

EUR-Lex 2012 Data Extraction using Web Services

A collaborative platform for knowledge management

Research Roadmap for the Future. National Grape and Wine Initiative March 2013

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

BYODs & FAIR Data Stewardship

Cloud-Based Self Service Analytics

Data Management at UT

Genevestigator Training

i2b2 Clinical Research Chart

data.bris: collecting and organising repository metadata, an institutional case study

It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking

Supporting Change-Aware Semantic Web Services

S1. Training to sustain evolutionary biology

Research Data Management

Industry 4.0 and Big Data

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

The future of eresearch user support

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Queensland recordkeeping metadata standard and guideline

i2b2 Clinical Research Chart

EMBL Identity & Access Management

Software Architecture Document

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

Ganzheitliches Datenmanagement

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining

Linked Statistical Data Analysis

REACCH PNA Data Management Plan

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services.

How To Understand The Difference Between Terminology And Ontology

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Automatic Timeline Construction For Computer Forensics Purposes

Introduction to Research Data Management

CitationBase: A social tagging management portal for references

Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness

A grant number provides unique identification for the grant.

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Ontology and automatic code generation on modeling and simulation

E-Business Technologies for the Future

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

A Service for Data-Intensive Computations on Virtual Clusters

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Globus Research Data Management: Introduction and Service Overview

SmartLink: a Web-based editor and search environment for Linked Services

Research Collaboration in the Cloud: - the NeCTAR Research Cloud

A Framework for Collaborative Project Planning Using Semantic Web Technology

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Naming in Distributed Systems

Management of Research Data Procedure

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

Transcription:

PODD An Ontology Driven Architecture for Extensible Phenomics Data Management Gavin Kennedy Gavin Kennedy PODD Project Manager High Resolution Plant Phenomics Centre Canberra, Australia

What is Plant Phenomics? Phenome = Genome X Environment Genomics is accelerating gene discovery but how do we capitalise on these data sets to establish gene function and development of new genotypes for agriculture? High throughput and high resolution analysis capacity now the factor limiting discovery of new traits and varieties Accelerated systematic analysis of the phenotypes that arise from the combination of genome and environment In the next 50 years we must produce more food than we have consumed in the history of mankind Megan Clarke, CSIRO CEO 2009

Phenomics from the Leaf to the Field Imagine a plant breeder walking his trials, logging plant performance from distributed sensors with his mobile phone or logging on to Phenonet from home to view his wheat in real time

Plant Phenomics: A Case Study High Resolution Plant Phenomics Centre (HRPPC) Multiple platforms for deep phenotyping High resolution phenotyping of model and crop plants Scale from small model plants to crop fields Scale from a stand of trees to the plant cell Phenomics is Characterised by Multiple Heterogeneous High Volume High Resolution Data Generation Platforms PlantScan Scanalyzer MRI Phenonet TrayScan NIR Phenomobile FTIR Microscopy Gas Exchange

PODD PlantScan Data Phenonet Phenomobile Metadata TrayScan The PhenomicsOntology Driven Data repository A research data and metadata repository. PODD Metadata Repository Data Metadata PODD Data Stores Managing PhenomicsData from Multiple Heterogeneous High Volume High Resolution Data Generation Platforms A methodology for managing and publishing research data outputs. A semantic web data resource.

A word on the Semantic Web The Semantic Web is a "web of data" that enables machines to understand the semantics, or meaning, of information on the World Wide Web.

Another word on the Semantic Web Semantic Web technologies (i.e. Resource Description Framework, or RDF) allow us to not only describe data files, but to link them across the web.

RDF RDF (Resource Description Framework) is a simple format for making knowledge assertions. Composed of subject predicate object statements. Gavin (subject) Likes (predicate) Coffee (Object) Or Investigations (subject) Have (predicate) Materials (object) With these statements we make assertions about a domain and thus model domain knowledge.

The PODD Project Web based data management system. Manage multiple heterogeneous data resources. Data objects are annotated with metadata. Metadata is Ontology based to support data discovery and analysis Metadata maintained in format to support the semantic web.

Annotating with Metadata Metadata Data about data Information that supports understanding and contextualisation of the data. Data Metadata Format of data Experimental Protocols Time and date of capture Platform captured on Organism imaged Growth stage of organism Genotype of organism Treatments Analysis Protocol Metadata solves Heterogeneity

Annotating with Ontologies Ontology Common Vocabulary A set of terms used to describe a common domain.

Using Ontologies Annotation This image shows the wheat plant on the left has increased salt tolerance (TO_0006001) Discovery Show me all experiments associated with the term salt sensitivity (TO_0000429) Modelling a domain OBI_0000050 : platform A platform is an object_aggregate that is the set of instruments and software needed to perform a process. Annotation means we describe a domain object or process. Modelling means we categorize a domain object or process, and its relationship to other domain objects or processes.

Why Use Ontologies? * Integrated Biological Systems To share common understanding of the structure of information among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyse domain knowledge * Ontology Development 101: A Guide to Creating Your First Ontology Natalya F. Noy and Deborah L. McGuinness http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html

Modelling Phenomics Data with an Ontology In the PODD Ontology we model every thing as objects: Experiments (Investigations) Plants (Materials) Treatments Environments Measurement Platforms Temporal Events Raw Data (Data) We then define the relationships between objects: Investigation has Material Material has Observations Material references Genotype Data references Material

The extensible PODD Ontology Project Project Plan Investigation Platform Analysis Event Genotype Treatment Material Material Container Data Environment Design Sex Observation/ Phenotype Treatment Archive Data Gene Sequence Measurement Measurement Parameter

Managing Phenomics Data with an Ontology The PODD ontology is written in an ontology schema language called OWL/RDFS. Using the PODD ontology as our template... We then create instances of the metadata objects and their relationships (in RDF). And attach the data files to the relevant objects.

The PODD Ontology Drives Repository Functions Presentation: Rendering of the web pages for: object creation; editing, display, etc. Management: Objects stored as per the model in the ontology. Validation: Ontology reasoning performed against objects. Discovery: SPARQL Queries or metadata searches. Refine the Domain? Extend the Ontology. Change the Domain? Change the Ontology.

Phenomics Data Management Requirements * Capture Manage Store Secure Annotate Distribute Publish Discover Multiple heterogeneous proprietary data generation platforms. Move data off localised servers/workstations Automate data & metadata capture Centralise data access Organise data at the project level Manage data files at remote data stores/data clouds. Secure data at the project level. Protect IP. Provide context and meaning to the data Delivering data to clients Sharing data with collaborators Public Online Access. Data publication identifiers. Online Access. Semantic Web Enabled. * Based on analysis of HRPPC Information Systems Requirements, March, 2009

Phenomics Data Management Requirements Annotate Provide context and meaning to the data Context and meaning through metadata and annotation. Metadata through PODD ontology objects and relationships between objects. Extended annotation through external ontologies (e.g. Gene Ontology, Trait Ontology) Trait Ontology Stress Trait (80 objects) Abiotic Stress Trait (40 objects) Stem Strength Leaf Rolling Water Stress Trait (8 objects) Chemical Stress Sensitivity (32 objects) Nutrient Sensitivity (12 objects) Micronutrient Sensitivity (3 objects) Macronutrient Sensitivity (3 objects) Salt Sensitivity (6 objects) Salt Tolerance (6 objects) Potassium Chlorate Tolerance Pesticide Sensitivity Oxidative Stress (20 objects) Biotic Stress Trait (40 objects) Project: Salt Tolerance comparison in Uni of Adelaide Wheat Breeding Lines (Public Access) Principal Investigator: Mark Tester, Lead Institution: University of Adelaide Object: Material, Title: Salt_Tol_1_1 http://www.plantphenomics.org.au/podd/object/poddobject:4632 Project: Salt Tolerance comparison in Uni of Adelaide Wheat Breeding Lines (Public Access) Principal Investigator: Mark Tester, Lead Institution: University of Adelaide Object: Material, Title: Salt_Tol_1_2 http://www.plantphenomics.org.au/podd/object/poddobject:4633...

Phenomics Data Management Requirements Publish Public Online Access. Data publication identifiers. Projects are secure, only authorised users can access them, until Projects can be published, granting public (read only) access to the data. PODD objects are identified by persistent publishable URLs (e.g. DOIs or ANDS PODD objects are identified by persistent publishable URLs (e.g. DOIs or ANDS PIDS). http://hdl.handle.net/10378.3/14058

Resources Plant Phenomics Test Instance: http://poddtest.plantphenomics.org.au/ Plant Phenomics Production Instance: http://podd.plantphenomics.org.au/ Mouse Phenomics Production Instance: http://podd.australianphenomics.org.au PODD Project Website: http://projects.arcs.org.au/trac/podd An ontology-centric architecture for extensible scientific data management systems, Yuan-Fang Li, Gavin Kennedy, Faith Ngoran, Philip Wu, Jane Hunter, Future Generation Computer Systems, In Press, Accepted Manuscript, Available online 8 July 2011. Contact: Gavin.Kennedy@csiro.au Ph: 0413 337 819

The PODD Collaboration PODD Project Manager Gavin Kennedy University of Queensland eresearch Lab: Faith Davies (Developer) Simon McNaughton (Developer) Peter Ansell (Developer) Jane Hunter (eresearch Lab Leader) APPF/HRPCC/CSIRO Xavier Sirault (Science Leader, HRPPC) Xueqin Wang (Tester, documentor) Bob Furbank (APPF HRPPC Director) APPF/Plant Accelerator/Uni of Adelaide: Bogdan Masznicz (Bioinformatician) Jianfeng Li (Developer) Mark Tester (APPF TPA Director) APN Philip Wu (Developer) Martin Hamilton (Developer) Adrienne McKenzie (APN Head of Network Services) Monash Univesity Yuan-Fang Li (Developer, Guru) NeAT Andrew Treloar (Deputy Director ANDS) Paul Coddington (Projects Manager, ARCS) ALA Donald Hobern (Director, ALA) This work is part of a National eresearch Architecture Taskforce (NeAT) project, supported by the Australian National Data Service (ANDS) through the Education Investment Fund (EIF) Super Science Initiative, and the Australian Research Collaboration Service (ARCS) through the National Collaborative Research Infrastructure Strategy Program.