Enabling a data management system to support the good laboratory practice Master Thesis Final Report Miriam Ney (09.06.2011)



Similar documents
Enabling a data management system to support the good laboratory practice Masterthesis Status Report Miriam Ney (13.01.

Data Archiving in Energy Research. Iver Lauermann Helmholtz-Zentrum Berlin für Materialien und Energie (HZB)

Problems in research data archiving

PKI Adoption Case Study (for the OASIS PKIA TC) ClinPhone Complies with FDA Regulations Using PKIbased Digital Signatures

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

Content Management in Web Based Education

Flattening Enterprise Knowledge

Requirements for Standard Compliant Electrical Test Authoring in Manufacturing Applications

How To Write An Airplane Grid Project (Aerogrid)

Second EUDAT Conference, October 2013 Workshop: Digital Preservation of Cultural Data Scalability in preservation of cultural heritage data

Implement best practices by using FileMaker Pro 7 as the backbone of your 21 CFR 11 compliant system.

A Data Management System for UNICORE 6. Tobias Schlauch, German Aerospace Center UNICORE Summit 2009, August 25th, 2009, Delft, The Netherlands

BPM for Quality Assurance Systems in Higher Education. Vicente Cerverón-Lleó, Juan Cabotà-Soro, Francisco Grimaldo-Moreno, Ricardo Ferrís-Castell

On the Applicability of Workflow Management Systems for the Preservation of Business Processes

Property & Casualty Insurance Solutions from CCS Technology Solutions

Hudson configuration manual

User Guidance in Business Process Modelling

THE CCLRC DATA PORTAL

Databricks. A Primer

The Impact of 21 CFR Part 11 on Product Development

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

San Jose State University

Release Automation for Siebel

for High Performance Computing

Databricks. A Primer

RepoGuard Validation Framework for Version Control Systems

How To Understand The Difference Between Business Process And Process Model In Java.Java.Org (Programming)

Internal Control Deliverables. For. System Development Projects

The Database Systems and Information Management Group at Technische Universität Berlin

Federated, Generic Configuration Management for Engineering Data

Integrating Research Information: Requirements of Science Research

Building Science Gateways and Workflows

Archive I. Metadata. 26. May 2015

CRITEO INTERNSHIP PROGRAM 2015/2016

zen Platform technical white paper

IT Service Continuity Management PinkVERIFY

Archival principles and data sustainability. Dr Andrew Wilson Queensland State Archives

BPM FOR QUALITY ASSURANCE SYSTEMS IN HIGHER EDUCATION

TIBCO Spotfire and S+ Product Family

Database preservation toolkit:

Summary Table of Contents

HP Application Lifecycle Management (ALM)

ETD The application of Persistent Identifiers as one approach to ensure long-term referencing of Online-Theses

Running a Program on an AVD

Capturing provenance information with a workflow monitoring extension for the Kieker framework

SQS the world s leading specialist in software quality. sqs.com. SQS Testsuite. Overview

Masterthesis. Analysis of Software-Engineering-Processes. von. Clemens Teichmann. - Matriculation number.: IN-Master -

Digital Preservation. OAIS Reference Model

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

CFR Part 11 Compliance

Meta-Model specification V2 D

Product Training Services. Training Options and Procedures for JobScheduler and YADE

A dynamic data integration approach to build scientific workflow systems

ArchiSafe. The Archive with Legal Compliance. Dir. & Prof. Dr. Siegfried Hackel. Federal Ministry of Economics and Technology

1 Business Modeling. 1.1 Event-driven Process Chain (EPC) Seite 2

Advances and Work in Progress in Aerospace Predesign Data Exchange, Validation and Software Integration at the German Aerospace Center

MULTIDIMENSIONAL META-MODELLING FOR AIR TRAFFIC MANAGEMENT SERVICE PROCESSES

Automatic Topology Completion of TOSCA-based Cloud Applications

D-SPIN. Volker Boehlke, Gerhard Heyer University of Leipzig. - overview, prototype, roadmap - Institut für Informatik

Tools for Analysis of Performance Dynamics of Parallel Applications

ENTERPRISE DOCUMENTS & RECORD MANAGEMENT

The biggest challenges of Life Sciences companies today. Comply or Perish: Maintaining 21 CFR Part 11 Compliance

Management von Forschungsprimärdaten und DOI Registrierung. Dr. Matthias Lange (Bioinformatics & Information Technology) June 19 th, 2013

Managing Third Party Databases and Building Your Data Warehouse

Management of Records

SQS-TEST /Professional

Organization of VizieR's Catalogs Archival

Abstract. Introduction

ERP Integration Strategies Approaches for Integrating ERP and PLM

Considerations for Management of Laboratory Data

Web. Studio. Visual Studio. iseries. Studio. The universal development platform applied to corporate strategy. Adelia.

Comparative Market Analysis of Project Management Systems

Scope. Cognescent SBI Semantic Business Intelligence

A new Model for Document Management in e- Government Systems Based on Hierarchical Process Folders

An Introduction to TextGrid

ProGUM-Web: Tool Support for Model-Based Development of Web Applications

A Concept for an Electronic Magazine

Deployment Topologies - DPAdmin An isoagroup Product

Page 1. Lecture 1: Introduction to. Introduction to Computer Networks Security. Input file DES DES DES DES. Output file

Tool-Based Business Process Modeling using the SOM Approach

Tool Integration and Data Formats for Distributed Airplane Predesign

Workflow/Business Process Management

ETD 2003, Berlin. Persistent Identifier Management at. Die Deutsche Bibliothek

An Introduction to Managing Research Data

Clinical database/ecrf validation: effective processes and procedures

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Final Year Projects at itm. Topics 2010/2011

K-Series Guide: Guide to digitising your document and business processing. February 2014 LATEST EDITION

Archäologische Informationen and Open Journal Systems

StARScope: A Web-based SAS Prototype for Clinical Data Visualization

Getting Started Guide for Developing tibbr Apps

Oracle9i Application Server: Options for Running Active Server Pages. An Oracle White Paper July 2001

Report of the DTL focus meeting on Life Science Data Repositories

CS 356 Lecture 28 Internet Authentication. Spring 2013

MSc INFORMATION TECHNOLOGY (Software Engineering) Course Dissertations (Abstracts) 2009/2010

ERMS Solution BUILT ON SHAREPOINT 2013

Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.

PDA DRIVEN WAREHOUSE INVENTORY MANAGEMENT SYSTEM Sebastian Albert Master of Science in Technology

Responsibilities of Test Facility Management & Sponsor Rik Hendriks Werner Coussement

Transcription:

Enabling a data management system to support the good laboratory practice Master Thesis Final Report Miriam Ney (09.06.2011)

Overview Description of Task Phase 1: Requirements Analysis Good Laboratory Practice Scientific Workflow Laboratory Notebook DataFinder Phase 2: Implementation DataFinder Configuration Origin of Data Provenance Integration Evidential Archiving Signing Data Phase 3: Evaluation Conclusion Folie 2

Description of Task: Enabling a data management system to support the good laboratory practice Data management system: DataFinder Open Source Project developed by DLR Data management and workflow management Meta data handling Good laboratory practice (GLP) Scientific conduct of experiments Regulatories from DFG, OECD, Universities, Part of GLP: laboratory notebook Folie 3

Phase 1: Requirements analysis What is the good laboratory practice? The principles of Good Laboratory Practice (GLP) have been developed to promote the quality and validity of test data used for determining the safety of chemicals and chemicals products. OECD Principles on Good Laboratory Practice (as revised in 1997) [The recommendations] are designed to provide a framework for the deliberations and measures which each institution will have to conduct for itself according to its constitution and its mission Deutsche Forschungsgemeinschaft: Sicherung guter wissenschaftlicher Praxis (Safeguarding good scientific practice) 1998 (p.50). Folie 4

Phase 1: Requirements analysis What does a scientific workflow look like? Picture adapted from: www.belab-forschung.de Folie 5

Phase 1: Requirements analysis: What is a laboratory notebook? Das Laborbuch ist ein Tagebuch des experimentierenden Naturwissenschaftlers (The laboratory notebook is the diary of the experimenting scientist) (Schreiben und Publizieren in den Naturwissenschaften Von Hans F. Ebel,Claus Bliefert,Walter Greulich; chapter 1.3 - page 16) Folie 6

Phase 1: Background What is the DataFinder? Connect to a repository Folie 7

Phase 1: Background What is special about the DataFinder? Structuring of data in a standardized way through a data model Restricting the user to a layout Forcing the user to enter meta data Using heterogenous storage backend for your data Best fitting storage solution depending on data Existing solutions can be kept Offline storages possible Extension by scripts Adjusting to your environment Automation of data processing steps Folie 8

Phase 2: Implementation DataFinder Configuration General: data model Scientific Documentation needs Credibility Traceability Durability Extension to the DataFinder: Process documentation Evidential Preservation Signing data Folie 9

Phase 2: Implementation Feature 1: Process documentation Provenance Integration Provenance (lat. provenire = to come from): origin of data, source Motivation: Credibility Chain of events Integrity Tasks: Developing provenance model for the Good Laboratory Practice PrIMe and OPM Enhance provenance storing system noblivious Graph database with REST interfaces Integration into DataFinder Developing Python script extension Folie 10

Feature 1: Origin of Data Provenance Integration Developing the OPM model Folie 11

Feature 1: Origin of Data Provenance Integration Provenance Storing System Java Implementation Server: Jetty Graph Database: neo4j Interfaces Storing Provenance (REST) Extracting Provenance (REST) Extracting Provenance (Servlet) Open Source (Apache License Version 2.0) Folie 12

Feature 1: Origin of Data Provenance Integration DataFinder Extension: Process documentation Goal: Documentation of the data items added: according to the good laboratory practice (scientific work flow) Folie 13

Feature 1: Origin of Data Provenance Integration DataFinder Extension: Process documentation Realization: User imports File (output) to DataFinder repository Active Import Listener reacts and dialog for input pops up Information on input and output is sent to provenance store Changes in DataFinder core: Automatic execution of a script Integration of a concept for event listeners Extensions to Script API Folie 14

Phase 2: Implementation Feature 2: Evidential Preservation Motivation: Recommendation 7: Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin. Deutsche Forschungsgemeinschaft: Sicherung guter wissenschaftlicher Praxis (Safeguarding good scientific practice) 1998 (p.55). Tasks: Evaluation of important archiving characteristics Extraction of relevant data using provenance Integration of a preservation service Connecting to DataFinder Folie 15

Feature 2: Evidential Preservation DataFinder Extension: Use of Provenance Information Goal: Extraction of data relevant for the preservation process Folie 16

Feature 2: Evidential Preservation DataFinder Extension: Use of Provenance Information Realization: User chooses a study report User activates script Script extracts from the provenance system relevant input items Relevant data items are added to an archive Archive is stored in the DataFinder Folie 17

Feature 2: Evidential Preservation Integration of a preservation service Beweissicheres Laborbuch Project (BeLab): DFG Project from: Physikalisch Technische Bundesanstalt Braunschweig Karlsruher Institute of Technology Universität Kassel Goal of the project: Development of a service, which characterizes the preservation time of an item characterizes the legal trustworthiness of an item stores the archive securely Folie 18

Feature 2: Evidential Preservation DataFinder Extension: Integration of BeLab Goal: Use the service to get a classification on time span in legal issues Increase trustworthiness of DataFinder items Realization: User chooses an archive and activates script Script sends the archive to BeLab service via WS-Secure The service processes the archive Service returns preservation information, which is stored Folie 19

Phase 2: Implementation Feature 3: Signing digitally Motivation: Authenticity in general Attesting authentication Tasks: Concept: Data: as Meta Data item Meta Data: extraction to XML and then store as XML Signature Integration into DataFinder Simple Concept: Signature of the data as separate file Folie 20

Feature 3: Signing digitally DataFinder Extension: Integration of signatures Realization: User chooses a file and executes script A signature file is generated (PKCS #7) Signature file is stored in the DataFinder Folie 21

Phase 3: Evaluation Development approaches Choice for each feature: Feature 1: Provenance integration in DataFinder: Prototyping Feature 2: Evidential Preservation Test Driven Development Feature 3: Signing Data For Concept: Pair Programming In Thesis: Test Driven Development Folie 22

Phase 3: Evaluation Design of Extensions Each feature is evaluated on three characteristics Usability: Effectiveness, efficiency and satisfaction Example: automatic execution of task (Process Documentation) Adaptability Change in environment Example: no provenance system is needed (Preservation) Adjustments Improvements Example: Choice of signature algorithms, (Signing) Folie 23

Phase 3: Evaluation DataFinder as Electronic Laboratory Notebook Implementations Documentation of a chain of events Credibility Durability/ Preservation Adjustments Automatic Report generation Mobile version Graphical representation and integration of the provenance information Folie 24

Conclusion Impact and Future Impact of my work Basic use case of DataFinder Basic scientific documentation workflow modelled with provenance Design of a general provenance storage system Future Work Integration of signing data into core Marketing for the provenance system and laboratory notebook application Publication of provenance system (source code and design) Folie 25

Questions? Stand # Kontakt: Miriam Ney DLR Simulations- und Softwaretechnik, Berlin Email: Miriam.Ney@dlr.de Folie 26

Phase 1: Requirements analysis What does a scientific workflow look like? (Detail) Folie 27

Feature 1: Origin of Data Provenance Integration Rest Interface Storing Provenance information localhost:9999/rest/provenance? @param process String having the information about the process and its general type e.g. "process" @param input String symbolizing all inputs e.g. "InputType~input1@29;InputType~input2@30;InputType~input2@40" @param ouput String symbolizing the output e.g. "OutputType~output21@29" @param actor String symbolizing the actor e.g. "ActorType~actor1;ActorType~actor2" OutputType/ InputType = Type defined in the model ~ @ ; Delimiter Input1 = identifier of the object 29 : Version of the object Folie 28