Usage of the EDNA Framework for biology applications Jérôme Kieffer Data Analysis Unit 1
Layout I.What is a Synchrotron? Online data analysis II.EDNA Framework Introduction Strength of EDNA III.Applications for biology based on EDNA Mxv1: Characterization of protein crystals AutoProc: Data reduction of protein single crystal diffraction Dimple: Molecular replacement searching for bound ligands BioSaxs: pipelines for small angle scattering IV.Development of EDNA History Collaboration & Community 2
What is a Synchrotron? Linear Accelerator Booster Storage Ring Optics Hutch Experiment hutch Control hutch Answer: An X-ray source, a very bright and tunable one Courtesy of Synchrotron Soleil 3
Online data analysis what for? Camera Device server (Tango) Beam-line Sequencer (Spec) or Beam-line control GUI Data analysis server EDNA Analyse data automatically Return results to the sequencer Avoid human interaction (fail-safe) 4
EDNA Framework A pipeline tool for the development of robust on-line data analysis applications or Unix pipe on steroids 5
Strength of EDNA EDNA is a robust pipe-lining tool for on-line data analysis Written in (pure) python and fully open-source It has been tested with thousands of tasks at once program EDNA allows hi-performances Multi-threaded implementation EDNA relies on data-models step1 step2 Visual communication with scientists Automatic bindings with the code EDNA has a strong testing framework Execution plug-in step3 step4 Unit & execution tests Continuous integration with nightly builds Control plug-in EDNA is efficient to program Plugin generator for execution plug-ins based on the data model Re-use of plug-ins already written by others: EDNA-Toolbox Courtesy of Olof Svensson 6
EDNA at a Glance Collaboration Framework ESRF (Grenoble), Diamond (Oxford), EMBL (Grenoble, Hamburg), MRC-LMB (Cambridge), CCP4 (UK mainly), Soleil (Paris), Bessy (Berlin), Max Lab (Sweden), SLS-PSI (Swiss), Univ Sidney, Univ York, Global Phasing (UK) EDNA applications Many tools available: EDNA tool box 93 Control plugins 119 Execution plugins 25 Others plugins Easy to extend: Python code (v2) Tested daily on linux,mac, win & jython GPL or LGPL licence MXv1/2 protein Xtal charact.(esrf, DLS,...) SAXS (EMBL, ESRF, DLS) Darc Archiver (DLS) Diffraction Tomography (ESRF) Dimple molecular replacement (CCP4) Full-Field XANES (ESRF) Xncf EXAFS analysis (DLS) AutoProc data reduction for MX http://www.edna-site.org 7
Application of EDNA in Biology 8
MXv1 Porting of DNA with some enhancements In production on 2 synchrotrons (about 10 beam-lines) Courtesy of Olof Svensson 9
AutoProc: Diffraction data reduction Create xds_fastproc dir copy XDS.INP to it. Then launch xds_fullrun on the cluster. Wait for files Change some params runxds() control plugin Parse CORRECT.LP Wait for files Upload results to ISPyB Control plugin generate w/ and w/out anom Apply res cutoff Backup XPARM.XDS Change JOB=CORRECT Minimal XDS run Minimal XDS run Parse CORRECT.LP With anomalous scattering Without anomalous scattering Generate w/ anom, merged and unmerged Res cutoffs rbins XScale Generate w/out anom, merged and unmerged Res cutoffs rbins XScale Courtesy of Thomas Boeglin 10
Dimple: DIfference Map PipeLinE Screening crystals for bound ligands Made by CCP4 & Diamond Courtesy of Graeme Winter & Ronan Keegan 11
BioSaxs: 3 Pipelines Many data files Image Subtracted curve Process One Image Smart merge Normalization Image datcmp datcmp datcmp Azimuthal Integration Radiation damage check: Merge data that are the same Saxs Angle Auto Subtract buffer Re-writing Output With Metadata AutoRg Find radius of gyration datcmp datcmp gnom Ab-initio modelling datcmp datcmp dammif datcmp datcmp Supcomb Averaged curve Data file 3 column ascii file Subtracted curve Web page containing Radius of Gyration 3D models (as PDB files) 12
Back to EDNA developments 13
History of EDNA 2000: Development of DNA (auto-indexing & strategy) 2007: Development of DNA2 based on Python+AALib+datamodels 3 developers at ESRF + Biostruct foundings 2009: Publication of EDNA (J. Synchrotron Rad.) Down to 2 developers, no more extra funding http://dx.doi.org/10.1107/s0909049509036681 Start spreading outside MX community (Diffraction Tomography) 2010: Code camps & collaborations Many new projects: Tomography, Dimple, BioSaxs, Exafs, 2011: Competition with DAWB Year of financial scarcity (2x0.5 developers) 2012: AutoProc New developer on automatic data reduction for protein crystallography 14
Development & user base EDNA is about a scientific collaboration mainly between ESRF and Diamond Light Source Infrastructure Dedicated server in a no man's land (dedibox.free.fr) Web server, bug tracker, continuous integration,... Git version control hosted at GitHub Moved away from self hosted SVN Distribution: tarball to unzip How many users/developers: Thousands of scientists are using EDNA applications Mainly at ESRF & Diamond Light Source Hundreds of people know what EDNA is about One hand full of actual developers, many with little commitment Kernel managed by two people 15
Participants to EDNA Alexander Popov (e) Alun Ashton (b) Andrew Leslie (h) Andrew McCarthy (c) Andrew Thompson (k) Clemens Schulze (j) Clemens Vonrhein (f) Darren Spruce (e) Elspeth Gordon (e) Ezequiel Panepucci (j) Gérard Bricogne (f) Gerrit Langer (c) Gleb Bourenkov (c) Gordon Leonard (e) Graeme Winter (b) Harry Powell (h) Irakli Sikharulidze (b) Jérôme Kieffer (e) Johan Turkenburg (m) Johan Unge (g) John Skinner (i) Karl Levik (b) Katherine McAuley (b) Lucile Roussier (k) Marie-Farnçoise Incardona (e) Mark Basham (b) Meitian Wang (j) Michael Hellmig (a) Olof Svensson (e) Olga Roudenko (k) Peter Keller (f) Peter Turner (l) Pierre Legrand (k) Robert Sweet (i) Romeu Pieritz (e) Ronan Keegan (n) Sandor Brockhauser (c) Sean McSweeney (e) Takashi Tomizaki (j) Thomas Schneider (c) Thomas Boeglin (e) Uwe Mueller (a) Institutions (a) BESSY, Berlin, Germany (b) Diamond Light Source, UK (c) EMBL, Grenoble, France (d) EMBL, Hamburg, Germany (e) ESRF, Grenoble, France (f) Global Phasing, Cambridge, UK (g) MAX LAB, Lund, Sweden (h) MRC LMB, Cambridge, UK (i) NSLS, Brookhaven, U.S. (j) SLS, Villigen, Switzeland (k) Synchrotron Soleil, France (l) University of Sydney, Australia (m) University of York, UK (n) CCP4 / STFC Current developers are in bold (committed code within last year) 16
Code analysis Analysis since migration to GitHub (2011) * contains 2/3 of generated Code (XSData). * XML is mainly for tests 17
Drawback from it's strength Typical example of Design by Committee anti-pattern Never designed to be installed or distributed Lives in it's own place Relies on many external programs Many are proprietary with licensing issues Integration within EDNA needs additional configuration API is too complicated Learning curve is very steep / added value is not clear EDNA was never adopted by scientists Framework based only on inheritance Used to have up to 13 levels of inheritance Issue with programming language EDNA is multi-threaded but limited by the GIL in Cpython Java-ish style repels python developers Data-modelling, data-binding, coding convention, Python code repels java developers 18
Conclusions EDNA: a framework for developing ODA application Original founding from BioStruct About 10 applications using the framework, Half of them related to biological application Synchrotrons rely on it: 10 beam lines at ESRF (25%) depend on it Heavy use at Diamond (UK) Acknowledgement: Andrew Leslie (Executive Committee Chair) Olof Svensson (EDNA project manager) Thomas Boeglin 19
20
Is it worth the effort? I don't think it takes 6 months to learn how to make EDNA plugins, it certainly takes 6 months to get into the kernel and being able to propose improvements to this. I agree for a short term project EDNA is not worth the effort. Olof Svensson (EDNA project manager) 21