Usage of the EDNA Framework for biology applications Jérôme Kieffer Data Analysis Unit

Similar documents
Control Software at ESRF beamlines

Automation and Remote Synchrotron Data Collection

Analysis Programs DPDAK and DAWN

Storage of the Experimental Data at SOLEIL. Computing and Electronics

MX Data (& Sample) Handling (& Tracking) at the ESRF Gordon Leonard ESRF Macromolecular Crystallography Group

e-science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

What's new in CCP4. Charles Ballard. CCP4, Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot OX11 0FA, UK

An Introduction to Diamond and the Harwell Campus. Martin Walsh

SOFTWARE DEVELOPMENT BASICS SED

Project Tracking System for Automated Structure Solution Software Pipelines

Dr. Marco Hugentobler, Sourcepole QGIS from a geodata viewer to a GIS platform

HAPPy Heavy Atom Phasing in Python. Dan Rolfe, Paul Emsley, Charles Ballard Maria Turkenburg, Eleanor Dodson

EMBL Identity & Access Management

Agenda. Tango meeting : Krakow

Oracle Universal Content Management

PLSAP CONNECTOR FOR TALEND USER MANUAL

Phase determination methods in macromolecular X- ray Crystallography

OMU350 Operations Manager 9.x on UNIX/Linux Advanced Administration

Upcoming APS Summer Schools

What CCPForge does Introduction to SESC and CCPForge Workshop Gemma Poulter

BIG data big problems big opportunities Rudolf Dimper Head of Technical Infrastructure Division ESRF

Status of Radiation Safety System at

Polarization Dependence in X-ray Spectroscopy and Scattering. S P Collins et al Diamond Light Source UK

EMBL. International PhD Training. Mikko Taipale, PhD Whitehead Institute/MIT Cambridge, MA USA

TEST AUTOMATION FRAMEWORK

Why this lecture exists ITK Lecture 12: Open Source & Cross Platform Software Development

The ANKA Archiving System

Integrated Open-Source Geophysical Processing and Visualization

PHYSIOLOGY AND MAINTENANCE Vol. II - On The Determination of Enzyme Structure, Function, and Mechanism - Glumoff T.

Research Activities and Services in Structural Biology. EMBL Grenoble

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

Bacula The Network Backup Tool for *BSD, Linux, Mac, Unix and Windows

Network Activity D Developing and Maintaining Databases

Ingeniørh. Version Control also known as Configuration Management

Version Control! Scenarios, Working with Git!

Liblouis a universal solution for Braille transcription services

Detailed Design Report

ELIS Managing Enterprise Level Learning Programs with Moodle

Discover the framework and make your first steps with it.

Bacula The Network Backup Solution

Invenio: A Modern Digital Library for Grey Literature

Jenkins: The Definitive Guide

Beamline Automation at the APS

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

Processing Data with rsmap3d Software Services Group Advanced Photon Source Argonne National Laboratory

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

Processing millions of logs with Logstash

A Laboratory Information. Management System for the Molecular Biology Lab

DeBruin Consulting. Key Concepts of IBM Integration Broker and Microsoft BizTalk

Electron density is complex!

Version Control Your Jenkins Jobs with Jenkins Job Builder

How To Test A Web Server

Software Engineering Support

Introduction to OpenTM2 An Open Source Solution for Translators

ICE Trade Vault. Public User & Technology Guide June 6, 2014

Data Quality Monitoring. workshop

CHESS DAQ* Introduction

Gothenburg Mainframe and Continuous Integration. Jan Marek com. CA Technologies. Session S610

sql server best practice

Nevada NSF EPSCoR Track 1 Data Management Plan

RHIC ELECTRONIC DATA COLLECTION AND SURVEY & ALIGNMENT DATABASE

6 th Annual EclipseCon Introduction to BIRT Report Development. John Ward

Open Source Multi-Cloud, Multi- Tenant Automation in the cloud with SlipStream PaaS

Opacus Outlook Addin v3.x User Guide

BUSMASTER An Open Source Tool

VOC Documentation. Release 0.1. Russell Keith-Magee

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

European Molecular Biology Laboratory Case Example

The HipGISAXS Software Suite: Recovering Nanostructures at Scale

Cloud Web-Based Operating System (Cloud Web Os)

Content Management Systems: Drupal Vs Jahia

The Data Quality Monitoring Software for the CMS experiment at the LHC

Cryo SAXS sample environment Project update 02/2007

Transcription:

Usage of the EDNA Framework for biology applications Jérôme Kieffer Data Analysis Unit 1

Layout I.What is a Synchrotron? Online data analysis II.EDNA Framework Introduction Strength of EDNA III.Applications for biology based on EDNA Mxv1: Characterization of protein crystals AutoProc: Data reduction of protein single crystal diffraction Dimple: Molecular replacement searching for bound ligands BioSaxs: pipelines for small angle scattering IV.Development of EDNA History Collaboration & Community 2

What is a Synchrotron? Linear Accelerator Booster Storage Ring Optics Hutch Experiment hutch Control hutch Answer: An X-ray source, a very bright and tunable one Courtesy of Synchrotron Soleil 3

Online data analysis what for? Camera Device server (Tango) Beam-line Sequencer (Spec) or Beam-line control GUI Data analysis server EDNA Analyse data automatically Return results to the sequencer Avoid human interaction (fail-safe) 4

EDNA Framework A pipeline tool for the development of robust on-line data analysis applications or Unix pipe on steroids 5

Strength of EDNA EDNA is a robust pipe-lining tool for on-line data analysis Written in (pure) python and fully open-source It has been tested with thousands of tasks at once program EDNA allows hi-performances Multi-threaded implementation EDNA relies on data-models step1 step2 Visual communication with scientists Automatic bindings with the code EDNA has a strong testing framework Execution plug-in step3 step4 Unit & execution tests Continuous integration with nightly builds Control plug-in EDNA is efficient to program Plugin generator for execution plug-ins based on the data model Re-use of plug-ins already written by others: EDNA-Toolbox Courtesy of Olof Svensson 6

EDNA at a Glance Collaboration Framework ESRF (Grenoble), Diamond (Oxford), EMBL (Grenoble, Hamburg), MRC-LMB (Cambridge), CCP4 (UK mainly), Soleil (Paris), Bessy (Berlin), Max Lab (Sweden), SLS-PSI (Swiss), Univ Sidney, Univ York, Global Phasing (UK) EDNA applications Many tools available: EDNA tool box 93 Control plugins 119 Execution plugins 25 Others plugins Easy to extend: Python code (v2) Tested daily on linux,mac, win & jython GPL or LGPL licence MXv1/2 protein Xtal charact.(esrf, DLS,...) SAXS (EMBL, ESRF, DLS) Darc Archiver (DLS) Diffraction Tomography (ESRF) Dimple molecular replacement (CCP4) Full-Field XANES (ESRF) Xncf EXAFS analysis (DLS) AutoProc data reduction for MX http://www.edna-site.org 7

Application of EDNA in Biology 8

MXv1 Porting of DNA with some enhancements In production on 2 synchrotrons (about 10 beam-lines) Courtesy of Olof Svensson 9

AutoProc: Diffraction data reduction Create xds_fastproc dir copy XDS.INP to it. Then launch xds_fullrun on the cluster. Wait for files Change some params runxds() control plugin Parse CORRECT.LP Wait for files Upload results to ISPyB Control plugin generate w/ and w/out anom Apply res cutoff Backup XPARM.XDS Change JOB=CORRECT Minimal XDS run Minimal XDS run Parse CORRECT.LP With anomalous scattering Without anomalous scattering Generate w/ anom, merged and unmerged Res cutoffs rbins XScale Generate w/out anom, merged and unmerged Res cutoffs rbins XScale Courtesy of Thomas Boeglin 10

Dimple: DIfference Map PipeLinE Screening crystals for bound ligands Made by CCP4 & Diamond Courtesy of Graeme Winter & Ronan Keegan 11

BioSaxs: 3 Pipelines Many data files Image Subtracted curve Process One Image Smart merge Normalization Image datcmp datcmp datcmp Azimuthal Integration Radiation damage check: Merge data that are the same Saxs Angle Auto Subtract buffer Re-writing Output With Metadata AutoRg Find radius of gyration datcmp datcmp gnom Ab-initio modelling datcmp datcmp dammif datcmp datcmp Supcomb Averaged curve Data file 3 column ascii file Subtracted curve Web page containing Radius of Gyration 3D models (as PDB files) 12

Back to EDNA developments 13

History of EDNA 2000: Development of DNA (auto-indexing & strategy) 2007: Development of DNA2 based on Python+AALib+datamodels 3 developers at ESRF + Biostruct foundings 2009: Publication of EDNA (J. Synchrotron Rad.) Down to 2 developers, no more extra funding http://dx.doi.org/10.1107/s0909049509036681 Start spreading outside MX community (Diffraction Tomography) 2010: Code camps & collaborations Many new projects: Tomography, Dimple, BioSaxs, Exafs, 2011: Competition with DAWB Year of financial scarcity (2x0.5 developers) 2012: AutoProc New developer on automatic data reduction for protein crystallography 14

Development & user base EDNA is about a scientific collaboration mainly between ESRF and Diamond Light Source Infrastructure Dedicated server in a no man's land (dedibox.free.fr) Web server, bug tracker, continuous integration,... Git version control hosted at GitHub Moved away from self hosted SVN Distribution: tarball to unzip How many users/developers: Thousands of scientists are using EDNA applications Mainly at ESRF & Diamond Light Source Hundreds of people know what EDNA is about One hand full of actual developers, many with little commitment Kernel managed by two people 15

Participants to EDNA Alexander Popov (e) Alun Ashton (b) Andrew Leslie (h) Andrew McCarthy (c) Andrew Thompson (k) Clemens Schulze (j) Clemens Vonrhein (f) Darren Spruce (e) Elspeth Gordon (e) Ezequiel Panepucci (j) Gérard Bricogne (f) Gerrit Langer (c) Gleb Bourenkov (c) Gordon Leonard (e) Graeme Winter (b) Harry Powell (h) Irakli Sikharulidze (b) Jérôme Kieffer (e) Johan Turkenburg (m) Johan Unge (g) John Skinner (i) Karl Levik (b) Katherine McAuley (b) Lucile Roussier (k) Marie-Farnçoise Incardona (e) Mark Basham (b) Meitian Wang (j) Michael Hellmig (a) Olof Svensson (e) Olga Roudenko (k) Peter Keller (f) Peter Turner (l) Pierre Legrand (k) Robert Sweet (i) Romeu Pieritz (e) Ronan Keegan (n) Sandor Brockhauser (c) Sean McSweeney (e) Takashi Tomizaki (j) Thomas Schneider (c) Thomas Boeglin (e) Uwe Mueller (a) Institutions (a) BESSY, Berlin, Germany (b) Diamond Light Source, UK (c) EMBL, Grenoble, France (d) EMBL, Hamburg, Germany (e) ESRF, Grenoble, France (f) Global Phasing, Cambridge, UK (g) MAX LAB, Lund, Sweden (h) MRC LMB, Cambridge, UK (i) NSLS, Brookhaven, U.S. (j) SLS, Villigen, Switzeland (k) Synchrotron Soleil, France (l) University of Sydney, Australia (m) University of York, UK (n) CCP4 / STFC Current developers are in bold (committed code within last year) 16

Code analysis Analysis since migration to GitHub (2011) * contains 2/3 of generated Code (XSData). * XML is mainly for tests 17

Drawback from it's strength Typical example of Design by Committee anti-pattern Never designed to be installed or distributed Lives in it's own place Relies on many external programs Many are proprietary with licensing issues Integration within EDNA needs additional configuration API is too complicated Learning curve is very steep / added value is not clear EDNA was never adopted by scientists Framework based only on inheritance Used to have up to 13 levels of inheritance Issue with programming language EDNA is multi-threaded but limited by the GIL in Cpython Java-ish style repels python developers Data-modelling, data-binding, coding convention, Python code repels java developers 18

Conclusions EDNA: a framework for developing ODA application Original founding from BioStruct About 10 applications using the framework, Half of them related to biological application Synchrotrons rely on it: 10 beam lines at ESRF (25%) depend on it Heavy use at Diamond (UK) Acknowledgement: Andrew Leslie (Executive Committee Chair) Olof Svensson (EDNA project manager) Thomas Boeglin 19

20

Is it worth the effort? I don't think it takes 6 months to learn how to make EDNA plugins, it certainly takes 6 months to get into the kernel and being able to propose improvements to this. I agree for a short term project EDNA is not worth the effort. Olof Svensson (EDNA project manager) 21