Analysis Programs DPDAK and DAWN An Overview Gero Flucke FS-EC PNI-HDRI Spring Meeting April 13-14, 2015
Outline Introduction Overview of Analysis Programs: DPDAK DAWN Summary Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 2 / 38
Introduction Background Data rates in modern X-ray experiments are rising. Raw data might not be exportable to users institutes. Online (or near-real time) analysis becomes more important for efficient use of beam time. Standardisation of data format ongoing: NeXus. Provide support for data analysis tools. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 3 / 38
Introduction (ctd.) PaN-Data WP D5.3: Investigate Existing Solutions Two suitable programs with different strengths: DPDAK: Setup flexible tool chains for online and offline data processing and analysis, 1/2D visualisation, e.g. for monitoring, easy to extend. DAWN: Generic data browsing, slicing and 1/2/3D visualisation, rich set of tools for ROIs, profiling, fitting, etc., toolbox for everything. My Involvement DPDAK: After core developer left (Jan 2014), I took over maintenance and development last summer. DAWN: I am contact person at DESY PETRA III doing tests and bug reports plus providing two minor contributions. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 4 / 38
DPDAK Directly Programmable Data Analysis Kit Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 5 / 38
DPDAK Directly Programmable Data Analysis Kit Open source Python program using standard packages: NumPy, SciPy, matplotlib, wxpython, fabio, h5py, pyfai. Cooperation between DESY (PETRA III P03) and MPI KG, (online) analysis of 2D scattering data. Windows, Linux. Core idea: sequential data processing and visualisation. Talk here about version 1.1.0 whose release is imminent. J. Appl. Cryst. 47, 1797 (2014) [doi:10.1107/s1600576714019773]. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 6 / 38
DPDAK Concept and Core Minimalistic start-up GUI. Three plugin types for data processing steps, GUI tools for data display etc., data export. Storage of processed data: in memory database, types: scalars, 1D/2D arrays, strings, file/directory paths, for images just their paths. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 7 / 38
DPDAK: Configuration of Processing Chain Select plugins from list. Select input from output of other plugins (type match ensured). Set parameters. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 8 / 38
DPDAK Processing Data input via ordinary plugins, asked to provide n th data item. Start, pause or stop processing chain interactively. If input plugin has done all stops. If n th data not found, retry (for online). Store database and configuration for later reload (Python cpickle). Can run in batch mode without GUI. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 9 / 38
DPDAK Parallel Processing Enabled using Pythons multiprocessing module. Number of threads is an option, via GUI or command line. Each processing thread instantiates all plugins, runs on a well defined subset of the data. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 10 / 38
DPDAK Input Data Formats Images: using fabio lib from ESRF (tif, edf,... ), numpy, mar300. Fio text files (DESY format). Two column text ( chi ) files. 2D data out of 3D Hdf5 stack: as 2D array (but most plugins have an image path as input), convert to image (tif) file. New plugins are needed for other data formats. (But that is rather easy to do.) Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 11 / 38
DPDAK Tools Choose from menu. Each in separate window, to show processed data or to provide e.g. powder diffraction calibration. During data processing: used to monitor since notified to update regularly. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 12 / 38
DPDAK Tools Image Display Scroll through processed images. Or directly open an image file. Displays sector ROI. Edit ROI coordinates. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 13 / 38
DPDAK Tools Interplay: Tools and Processing Parameters Tools notified if parameters of processing plugins edited. Tools can set parameters of processing plugins. Example: inner and outer radius of sector ROI. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 14 / 38
DPDAK Tools 1D Plots Select x and y from output of processing plugins. If array: show result of single processed data item, or stack of all. If scalar: show vs. index of processed data item, or vs. other scalar. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 15 / 38
DPDAK Tools 2D Color Plot 1D distribution of each processed data item ( frame ). Attach one after another to create 2D colour plot. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 16 / 38
DPDAK Fitting Processing Plugin Restrict fit range. Configure model as sum of function components (e.g. Pseudo-Voigt). Set (fixed) start values. Peak Fit Display Tool Original distribution (highlight fit region). Fitted curve and its components. Can show function with start parameters. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 17 / 38
DPDAK Export Export Plugins Can access complete database. User dialogues may specify what to export to which file. Generic plugins to export data (except 2D) to text files: Single Plugin Text Export, DB Text Export. Could add further plugins for further formats. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 18 / 38
DPDAK: Extendability User Plugins DPDAK easily extendable: (almost) everything is a plugin. New plugins can be added: write Python class inheriting from one of the plugin base classes (processing, tools, export), put code into a directory, add this directory to list of user plugin directories. User plugins treated exactly as DPDAK s core plugins. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 19 / 38
DPDAK: Outlook User Defined Fit Functions So far only a fixed set of functions available. Using mechanism of user plugin directories it is easy to allow users to extend the set. In-Memory Database All output of all processing plugins run for all frames kept in memory not scalable. Prototype of replacement by Hdf5 file back-end exists. Stored Configuration not Human Readable Batch processing needs DPDAK GUI to edit configuration. No principle problem to create text version: Logic already established for Hdf5 file back-end. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 20 / 38
DAWN Data Analysis WorkbeNch Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 21 / 38
DAWN Data Analysis WorkbeNch Open source Java program, based on Eclipse RCP. Mainly by Diamond, contributions from ESRF and others. Implements sophisticated support for: Visualization of data in 1D, 2D and 3D, Python script development, debugging and execution, Workflows for analyzing scientific data calling Python and binary codes. By and for the synchrotron community - overlap with other communities like neutron scattering, photon science, etc. Windows, Linux, (Mac OS). J. Synchr. Rad. 22 (2015) [doi:10.1107/s1600577515002283] Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 22 / 38
DAWN: Core Concepts Data and File Formats Abstract dataset class like numpy arrays in Python. Plugin approach to load data of various formats: images,.edf, ascii, hdf5/nexus,.fio,... DAWN finds out which loader is needed. Lazy loading of user-selected sections of large datasets. Visualisation: Plotting System Line graphs, scatter plots, images, surfaces,... Includes ROIs: lines, rectangles, sectors,... Includes tools to act on displayed data: fitting, derivatives, profiles, masking... Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 23 / 38
DAWN: 1D Visualisation, Expressions Choose x and y data using up to two y scales. Define expression: virtual dataset as function of others. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 24 / 38
DAWN: Slicing, 2D Visualisation Select indices (or ranges) of multi-dimensional datasets. Slice across stack of images. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 25 / 38
DAWN: Surface Plot Live 3D view (instantly updated): Rotate as you want. Select a box Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 26 / 38
DAWN: Hyper3D Box and Line Image on the left is average of images selected by the blue shaded area on the right. Red curve on the right displays, as a function of the image index, the average of the region of interest selected by the red square on the left. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 27 / 38
DAWN: Profiling Various image profiling tools: Line, box, radial, azimuthal,... In-time update if region of interest moved. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 28 / 38
DAWN: Line Profile Masked Mask taken into account in profiles. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 29 / 38
DAWN: Peak Fitting Interactively define range and maximum number of peaks to be fitted (can be many!). Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 30 / 38
DAWN: Partitioning of Functionality Perspectives Synchrotron data and its analysis are very diverse. Single entry point to all features swamps the user: group functionality in perspectives. Re-use common functionality (data, visualisation): user gets easily accustomed to it. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 31 / 38
DAWN: Perspectives Generic Core Perspectives Data Browsing/DEXPLORE: view 1D and 2D data, slices/subsets of data of any dimensionality, expressions to apply mathematical calculations creating virtual datasets. Trace: simplified working with line traces from multiple files. Workflow: graphically design data processing steps (similar to LabView), can call Python code. Processing: Setup chain of data processing steps. PyDev: IDE to develop and debug Python/Jython scripts. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 32 / 38
DAWN: Data Browsing Perspective a) explorer b) data/slice selection c) plot (incl. sector ROI, mask) d) colour mapping e) radial profile f) result of fitted peaks Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 33 / 38
DAWN Generic Data Processing: Two approaches Workflows: Graphically compose processing using actors, + can also use actors running (C)Python code, - limited (documentation of) workflow control, - graphical output (e.g. for monitoring) not supported. New: processing perspective - so far for diffraction only. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 34 / 38
DAWN: Python Development DAWN includes PyDev, a Python IDE for Eclipse. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 35 / 38
DAWN: More Perspectives Specific Science Cases Powder Diffraction Calibration PEEMA: Photo-emission electron mircosopy analysis Tomography Reconstruction NCD (Non-Crystalline Diffraction) Data Calibration/Reconstruction XAFS (X-ray absorption fine-structure)... Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 36 / 38
DAWN: Diffraction Calibration Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 37 / 38
Summary At PETRA III we are going to make both DPDAK and DAWN centrally available. DPDAK: Flexible tool chains for online and offline data monitoring, processing and analysis. DAWN: Generic data browsing (Hdf5/NeXus), visualisation and analysis. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 38 / 38
Backup Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 39 / 38
Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 40 / 38
DPDAK General Image Configuration rotation, axis flipping, masks, background. Gero Flucke (DESY) Analysis Programs DPDAK and DAWN April 13, 2015 41 / 38