Data Analysis Sequencing - EDNA. Olof Svensson Data Analysis Unit ISDD ESRF

Similar documents
Usage of the EDNA Framework for biology applications Jérôme Kieffer Data Analysis Unit

Analysis Programs DPDAK and DAWN

Development of a Standardized Data-Backup System for Protein Crystallography (PX-DBS) Michael Hellmig, BESSY GmbH

TEST AUTOMATION FRAMEWORK

Biomile Network - Design and Implementation

EMBL Identity & Access Management

"Charting the Course to Your Success!" MOC A Understanding and Administering Windows HPC Server Course Summary

Linking raw data with scientific workflow and software repository: some early

SOFTWARE TESTING TRAINING COURSES CONTENTS


Anar Manafov, GSI Darmstadt. GSI Palaver,

Automation and Remote Synchrotron Data Collection

User Autonomy Darren Spruce. Head of Kontrols & IT services (KITS)

SAS in clinical trials A relook at project management,

Version Control Your Jenkins Jobs with Jenkins Job Builder

Control Software at ESRF beamlines

e-science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

MPI / ClusterTools Update and Plans

Workshop for WebLogic introduces new tools in support of Java EE 5.0 standards. The support for Java EE5 includes the following technologies:

Certified The Grinder Testing Professional VS-1165

Introduction to OpenTM2 An Open Source Solution for Translators

Course Description. Course Audience. Course Outline. Course Page - Page 1 of 5

Bioinformatics for programmers

LSKA 2010 Survey Report Job Scheduler

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

Microsoft Windows PowerShell v2 For Administrators

Scheduling in SAS 9.3

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Arbeitspaket 3: Langzeitarchivierung von Forschungsdaten. JHOVE2 module for ROOT files 1

Use of Tcl/Tk in Railway signalling simulation and maintenance software

Linux Cluster - Compute Power Out of the Box

IBM Tivoli Workload Scheduler Integration Workbench V8.6.: How to customize your automation environment by creating a custom Job Type plug-in

HPC Wales Skills Academy Course Catalogue 2015

Kaltura Extension for IBM Connections Deployment Guide. Version: 1.0

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

Eliminate Memory Errors and Improve Program Stability

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

A QUICK OVERVIEW OF THE OMNeT++ IDE

Certified Selenium Professional VS-1083

A Day in the Life of a Cyber Tool Developer

#define. What is #define

Scheduling in SAS 9.4 Second Edition

WebLogic Server Administration

MSWL Development & Tool. Eclipse IDE

Agenda. Tango meeting : Krakow

Unit Testing webmethods Integrations using JUnit Practicing TDD for EAI projects

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

How to Configure the Workflow Service and Design the Workflow Process Templates

Fundamentals of LoadRunner 9.0 (2 Days)

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

GATECloud.net: Cloud Infrastructure for Large-Scale, Open-Source Text Processing

IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>

ns-3 development overview ns-3 GENI Eng. Conf., Nov

Documentation and Project Organization

Generating Automated Test Scripts for AltioLive using QF Test

aaps algacom Account Provisioning System

A High Performance Computing Scheduling and Resource Management Primer

Software Automated Testing

WissGrid. JHOVE2 over the Grid 1. Dokumentation. Arbeitspaket 3: Langzeitarchivierung von Forschungsdaten. Änderungen

Architecture and Mode of Operation

Exam Name: IBM InfoSphere MDM Server v9.0

1st Annual Report MAX-INF2. European Macromolecular Crystallography Infrastructure Cooperation Network

User-friendly access to Grid and Cloud resources for 18scientific th 19 th computing January / 21

Lab Management, Device Provisioning and Test Automation Software

The CMS analysis chain in a distributed environment

UPDATE MANAGEMENT SERVICE The advantage of a smooth Software distribution

MATLAB Distributed Computing Server Installation Guide. R2012a

IBM WebSphere Server Administration

WebSphere Server Administration Course

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

User Guide. Informatica Smart Plug-in for HP Operations Manager. (Version 8.5.1)

Developing Parallel Applications with the Eclipse Parallel Tools Platform

ITG Software Engineering

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR

Assignment # 1 (Cloud Computing Security)

RepoGuard Validation Framework for Version Control Systems

Systemd for Embedded Linux. Challenges and Opportunities

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

VisIt Visualization Tool

Release Notes scvenus 2.5.3

JMS: A workflow management system and web-based cluster front-end for the Torque resource manager

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

DevKey Documentation. Release 0.1. Colm O Connor

CLC Server Command Line Tools USER MANUAL

Administering batch environments

OpenMake Dynamic DevOps Suite 7.5 Road Map. Feature review for Mojo, Meister, CloudBuilder and Deploy+

SOFTWARE DEVELOPMENT BASICS SED

Automation and Virtualization, the pillars of Continuous Testing

Detailed Design Report

Enhanced System Integration Test Automation Tool (E-SITAT) Author: Akshat Sharma

Determine the process of extracting monitoring information in Sun ONE Application Server

Part I Courses Syllabus

CI for FPGA D&V. Continuous Integration for FPGA Design and Verification Verification Futures Alan Fitch, Ericsson TV Ltd

Enterprise Content Management System Monitor. Server Debugging Guide CENIT AG Bettighofer, Stefan

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Source Control Systems

MX Data (& Sample) Handling (& Tracking) at the ESRF Gordon Leonard ESRF Macromolecular Crystallography Group

Transcription:

Data Analysis Sequencing - EDNA Olof Svensson Data Analysis Unit ISDD ESRF

Why do we need EDNA? EDNA is the best answer we (developers) have come up with so far for answering these questions : How can we automate data analysis workflows? pipeline existing scientific software for (online) data analysis workflows abstract certain calculations to be generic, e.g. indexing of a diffraction pattern flexible workflows, rapid changes depending on the scientific needs How can we make these workflows robust? easily adapted to new versions of scientific software packages How can we collaborate efficiently? re-use of code without breaking existing functionality

What is EDNA? EDNA is about collaboration: Code sharing (SVN) Coding conventions Code reviews Open source (LGPL, GPL) Bug tracker Wiki : http://www.edna-site.org Memorandum of Understanding Executive committee Project manager / coordinator Regular meetings / video conferences EDNA is a framework: Generic kernel Data modelling framework Support for multi-threaded modules (plugins) development Support for workflow development Testing framework Specific applications (MXv1, biosaxs etc.) Automatic testing and nightly builds Automatic API doc generation No GUI EDNA Modular / Plugins Data Model / UML Code Workflow model Testing Framework Project Management

EDNA Modularity : Plugins and their hierarchy Plugin base class : Configuration, working directory, etc. Execution plugins : Execution of external programs, e.g. (bash) scripts step1 step2 Execution plugin Controller plugins: Control of execution plugins Parallel execution Synchronisation step3 step4 Control plugin

Existing scientific EDNA workflows Macromolecular crystallography: Characterisation taking into account radiation damage (MOSFLM, Labelit, RADDOSE, BEST) Connection with experiment data base (ISPyB) Parallel execution of characterisation (GRID data processing) Parallel creation of image thumbnails Diffraction Computed Tomography SPD: Image correction, fast azimuthal integration Sinograms saved in HDF5 format Small Angle Scattering Image correction and fast azimuthal integration Full Field XAS Image correction (dark, flat) Image alignment (offset measurements by FFT) HDF5 output

Challenge for the ESRF Upgrade : Massively Automated Sample Selection Integrated Facility

MXv1 Characterisation v1.2 + Xtal info + beam flux + diffraction plan MOSLFM indexing LABELIT DISTL Failure Indexing Evaluation LABELIT indexing Ok MOSFLM Predictions MOSFLM integration Indexing Evaluation Ok [RADDOSE] BEST Failure Data collection plan

EDNA Testing Framework ################################################################### Result for EDTestSuiteKernel : SUCCESS Number of executed test suites in this test suite : 2 Total number of test cases executed with SUCCESS : 14 Total number of test cases executed with FAILURE : 0 Total number of test methods executed with SUCCESS : 48 Total number of test methods executed with FAILURE : 0 Runtime : 25.847 [s] ###################################################################

EDNA Testing Framework ################################################################### Result for EDTestSuitePluginUnitAll : SUCCESS Number of executed test suites in this test suite : 5 Total number of test cases executed with SUCCESS : 83 Total number of test cases executed with FAILURE : 0 Total number of test methods executed with SUCCESS : 202 Total number of test methods executed with FAILURE : 0 Runtime : 122.892 [s] ###################################################################

EDNA Testing Framework ################################################################### Result for EDTestSuitePluginExecuteAll : FAILURE Number of executed test suites in this test suite : 5 Total number of test cases executed with SUCCESS : 124 Total number of test cases executed with FAILURE : 8 OBS! The following test methods ended with failure: EDTestCasePluginExecuteControlCharForReorientationv2_0_noKAPPA_ : testexecute : Plugin failure assert: should be False, was True FAILURE: Expected different from obtained - identifier /mntdirect/_scisoft/users/svensson/tmp/edtestsuitepluginexecuteall_20110114-093729/tmppw7olu... Total number of test methods executed with SUCCESS : 129 Total number of test methods executed with FAILURE : 8 Runtime : 1108.997 [s] ###################################################################

EDNA Testing Framework Checking out EDNA Repository... to revision 2758 Tests using the default python under linux: Python 2.6.5 Launching EDTestSuiteKernel with /usr/bin/python...success! 16.411 [s] Making EDNA-kernel tarball distribution... Launching EDTestSuiteCCP4v0 with /usr/bin/python...success! 16.466 [s] Making CCP4v0 tarball distribution... Launching EDTestSuitePluginExecPlugins with /usr/bin/python...success! 920.778 [s] Making ExecPlugins tarball distribution... Launching EDTestSuiteBioSaxs with /usr/bin/python...success! 49.655 [s] Making BioSaxsv1 tarball distribution... Launching EDTestSuitePluginUnitMXPluginExec with /usr/bin/python...success! 57.916 [s] Launching EDTestSuitePluginExecuteMXPluginExec with /usr/bin/python...success! 365.850 [s] Launching EDTestSuitePluginUnitMXv1 with /usr/bin/python...success! 73.398 [s] Launching EDTestSuitePluginExecuteMXv1 with /usr/bin/python...success! 3040.207 [s] Making MXv1 tarball distribution... Tests using jython under linux: Jython 2.5.2rc2 Launching EDTestSuiteKernel with /home/tester/bin/jython...success! 80.371 [s] Tests using the default python under : WinePython 2.7.1 Launching EDTestSuiteKernel with /home/tester/bin/pythonw...success! 44.725 [s]

How EDNA will evolve in the future Common EDNA developments (Kernel): Improvements of the data model framework (in progress) Improvements of logging (in progress) Full support of Windows and MacOS (in progress) Enhanced support of grid engines / job schedulers Improved documentation (plugin use cases) Graphical workflow editor (Data Analysis Workbench) Scientific developments: MX further enhancements of characterisation (kappa, XDS etc) MX auto processing wrappers Biosaxs data analysis (EMBL Hamburg software suite) Tomography More to come...

How to run EDNA on a cluster with a job scheduler EDNA is optimised for parallel execution of processes/plugins: Thread safe parallel execution of plugins Automatic synchronisation Automatic workload limitation Implemented : EDNA TANGO server No job scheduler no load balancing Today the EDNA kernel provides a limited support for running workflows on a cluster with a job scheduler: Works only if the program is started by EDNA in a script Call to grid engine submission must be blocking (sun/oracle grid engine - sync y ) Future : support for any job scheduler (sun/oracle, torque, condor, pbs, oar etc) Synchronisation through sockets? Persistent EDNA plugin launchers?

What the EDNA GUI could look like EDNA has no GUI framework All EDNA workflows must be executable on the command line Possible solution for a generic EDNA GUI: Workflow editor: Implicit documentation of workflow Implicit parallel workflows Possibility to easily modify / construct new workflows Possibility to debug workflows Possibility to restart a stopped workflow

Documentation!

EDNA Documentation Available today : Data models (png) Automatic API doc generation Wikipages with developers' How-to s Minutes / presentations of previous meetings, code camps etc Planned : Automatic plugin documentation repository (use cases etc) Workflow documentation (workflow tool)

EDNA Collaborators 2010 Alexander Popov (e) Alun Ashton (b) Andrew Leslie (h) Andrew McCarthy (c) Andrew Thompson (k) Clemens Schulze (j) Clemens Vonrhein (f) Darren Spruce (e) Elspeth Gordon (e) Ezequiel Panepucci (j) Gérard Bricogne (f) Gerrit Langer (c) Gleb Bourenkov (c) Gordon Leonard (e) Graeme Winter (b) Harry Powell (h) Jérôme Kieffer (e) Johan Turkenburg (m) Johan Unge (g) John Skinner (i) Karl Levik (b) Katherine McAuley (b) Lucile Roussier (k) Marie-Farnçoise Incardona (e) Mark Basham (b) Meitian Wang (j) Michael Hellmig (a) Olga Roudenko (k) Peter Keller (f) Peter Turner (l) Pierre Legrand (k) Robert Sweet (i) Romeu Pieritz (e) Sandor Brockhauser (c) Sean McSweeney (e) Takashi Tomizaki (j) Thomas Schneider (c) Uwe Mueller (a ) (a) BESSY, Berlin, Germany (b) Diamond Light Source, UK (c) EMBL, Grenoble, France (d) EMBL, Hamburg, Germany (e) ESRF, Grenoble, France (f) Global Phasing, Cambridge, UK (g) MAX LAB, Lund, Sweden (h) MRC LMB, Cambridge, UK (i) NSLS, Brookhaven, U.S. (j) SLS, Villigen, Switzeland (k) Synchrotron Soleil, France (l) University of Sydney, Australia (m) University of York, UK EDNA developers Executive committee