Data-analysis scheme and infrastructure at the X-ray free electron laser facility, SACLA T. Sugimoto, T. Abe, Y. Joti, T. Kameshima, K. Okada, M. Yamaga, R. Tanaka (Japan Synchrotron Radiation Research Institute) T. Hatsui, M. Yabashi (RIKEN SPring-8 Center)
X-ray Free Electron Laser (XFEL) Laser beam Laser X-ray Free Electron Laser 100% coherent Coherent Normal Lamps Synchrotron radiation (SR) THz light IR Visible UV EUV X-ray Hard X-ray x10 9 (peak brilliance) Long Wave length Hair Bacteria Virus Protein Atom 0.1nm Short 0.06nm SACLA X-ray for atom scale 2015-04-13 CHEP2015 2
SPring-8 and SACLA SACLA is an X-ray Free Electron Laser (XFEL) facility, located in the SPring-8 campus, west region, Japan. SPring-8 SPring-8, SR facility (8-GeV storage ring) OIST SACLA, XFEL facility (8-GeV linac) electron injector for SPring-8 (1-GeV linac and booster synchrotron) 2015-04-13 CHEP2015 3
XFEL lasing scheme at SACLA Lasing scheme e-gun Linear accelerator (L, S, C) In vacuum undulator XFEL Free e - Acceleration & compression Synchrotron radiation X-ray laser (coherent phase) SACLA uses three key technologies Low emittance injector (L-band & S-band) C-band accel. structures SCSS+ (under construction) Switching magnet In-vacuum undulators BL1 BL2 BL3 BL4 BL5 ~700 m Thermionic e-gun CeB6 5.7GHz C-band x64 5m 5m x18 ID x18 units 400m 230m (+exp.) Compact 700m 2015-04-13 CHEP2015 4
Beamlines XFEL 3 beamlines (BL1, BL2, BL3) have been constructed, and BL2, BL3 are in operation. BL2/BL3 fast-switched XFEL distribution will be started in FY2015. Operation time is 7,000 hour/year. 2015-04-13 CHEP2015 5
Coherent X-ray Diffraction Imaging (CXDI) 2000px diffraction before destruction Fraunhofer diffraction phase retrieval 2000px 1 M shots of diffraction images are necessary to reconstruct 3-D structure. Considering XFEL hit rate on sample, 10 M shots data are required. -> more than 100 TB storage per sample in reciprocal space 100 TBytes, 100TFLOPS x 3days K. J. Gaffney and H. N. Chapman, Science 316, 1444 (2007) To reconstruct 3-D image in real space, we must analyze relationship between 1 M images. XFEL experiments requires heavy computing resources. -> 100 TFLOPS x 3 days (per sample) 2015-04-13 CHEP2015 6
Experimental setups Serial femtosecond crystallography exp. CXDI Experiment Experiment changes every few days. We must setup detectors and DAQs as fast as possible. Plasma exp. 2015-04-13 CHEP2015 7
Requirement for the DAQ and the analysis systems Images are accumulated at 6 Gbps data rate 12 MBytes (12 sensors) at 60 Hz repetition One experimental sample requires 120 TBytes storage 12 MBytes x 10 M shots Short experimental period Sample and/or detector setup change every few days Users want to use the SACLA as a commercial microscope. (System tuning / DAQ is not their science) Analysis should be faster than the DAQ data rate 10 TFLOPS for run-by-run analysis and pre processing (10M -> 1M) 100 TFLOPS for 3-D structure reconstruction within a few days. 2015-04-13 CHEP2015 8
DAQ and Analysis System at SACLA 2015-04-13 CHEP2015 9
SACLA DAQ and Analysis System Front-end section Data transfer and Accumulating section Data analysis section VME CPU 1Gb Ethernet Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Front-end Systems Detector 10Gb Ethernet Data-handling Servers High-speed Cache Storage K computer (10 PF @Kobe) 14 TFLOPS x86 PC Cluster 2015-04-13 CHEP2015 10
Detector Front-end System Front-end section Data transfer and Accumulating section Data analysis section VME CPU 1Gb Ethernet Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Front-end Systems Detector 10Gb Ethernet Data-handling Servers High-speed Cache Storage K computer (10 PF @Kobe) 14 TFLOPS x86 PC Cluster 2015-04-13 CHEP2015 11
MPCCD sensor MPCCD Single MPCCD Octal 8x MPCCD sensor (512 x 1024 px, 16bit depth) Trigger = accelerator repetition (60 Hz) -> data rate is 0.5 Gbps/sensor To avoid radiation damage from direct XFEL beam, 8+2+2 sensors are installed offset from the beam line. sample Octal -> 6 Gbps data rate Double (+double) T. Kameshima, et al., Rev. Sci. Instrum. 85, 033110 (2014). Note: All images (including junk) are recorded. Because it is difficult to determine reduction condition within few-day experimental period. 2015-04-13 CHEP2015 12
DAQ Network Front-end section Data transfer and Accumulating section Data analysis section VME CPU 1Gb Ethernet Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Front-end Systems Detector 10Gb Ethernet Data-handling Servers High-speed Cache Storage (200 TBytes) 14 TFLOPS x86 PC Cluster K computer (10 PF @Kobe) 2015-04-13 CHEP2015 13
DAQ Network Front-end section Data transfer and Accumulating section Data analysis section 1GbE network for system control VME CPU 1Gb Ethernet Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Front-end Systems 10GbE network for image data transfer Detector 10Gb Ethernet Data-handling Servers We segregated physical backbones to ensure image-data transfer and recording. High-speed Cache Storage (200 TBytes) 14 TFLOPS x86 PC Cluster K computer (10 PF @Kobe) 2015-04-13 CHEP2015 14
DAQ Data flow Front-end section Data transfer and Accumulating section Data analysis section VME CPU 1Gb Ethernet Live View Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal 2) Live-view image is built by the data-handling servers. Front-end Systems Event-synchronized Database System 90 TFLOPS Supercomputer 10Gb Ethernet Detector 1) Image data are transferred via dedicate 10GbE. Data-handling Servers K computer 3) Server performs on-flight low-level High-speed (10 PF @Kobe) filtering in shot by shot. Filtering results are Cache Storage kept on the event-synchronized DB. (200 TBytes) 14 TFLOPS x86 PC Cluster 2015-04-13 CHEP2015 15
DAQ Data flow Front-end section Data transfer and Accumulating section Data analysis section VME CPU Camera Link 1Gb Ethernet User Terminal 5) Image raw data Event-synchronized are moved to long-term storage, Database System Front-end Systems Long-term Storage (Disk 1PB + Tape 7PB) 90 TFLOPS Supercomputer Detector 10Gb Ethernet 4) Image Data-handling data are recorded Servers on cache storage High-speed Cache Storage (200 TBytes) 14 TFLOPS x86 PC Cluster K computer (10 PF @Kobe) 2015-04-13 CHEP2015 16
Data analysis section Prompt Data Analysis (on-line / run-by-run) Front-end section Data transfer and Accumulating section Data analysis section on-flight low-level filtering Data assembly: convert raw images to HDF5 format background (dark frame) subtraction 1Gb Ethernet VME gain CPU calibration Long-term Storage sensors alignment calibration (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Full Data Analysis (off-line) data selection using low-level filtering classification averaging 10Gb Ethernet Detector alignment reconstruction of 3-D image K computer (10 PF @Kobe) Data-handling Servers 14 TFLOPS x86 PC Cluster High-speed Cache Storages 2015-04-13 CHEP2015 17
Prompt data analysis Prompt Data Analysis (on-line / run-by-run) Front-end section Data transfer and Accumulating section Data analysis section on-flight low-level filtering Data assembly: convert raw images to HDF5 format background (dark frame) subtraction 1Gb Ethernet VME gain CPU calibration Long-term Storage sensors alignment calibration (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Detector 10Gb Ethernet K computer (10 PF @Kobe) Data-handling Servers 14 TFLOPS x86 PC Cluster High-speed Cache Storages 2015-04-13 CHEP2015 18
Prompt data analysis Prompt Data Analysis (on-line / run-by-run) Front-end section Data transfer and Accumulating section Data analysis section on-flight low-level filtering Data assembly: convert raw images to HDF5 format background (dark frame) subtraction 1Gb Ethernet VME gain CPU calibration Long-term Storage sensors alignment calibration (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Detector 10Gb Ethernet K computer (10 PF @Kobe) Data-handling Servers 14 TFLOPS x86 PC Cluster High-speed Cache Storages 2015-04-13 CHEP2015 19
Prompt data analysis: Low-level filtering To reduce latter computing load, we want to identify candidates of GOOD event. Exapmle of GOOD event (diffraction pattern) Example of BAD event (scattered by water) 2015-04-13 CHEP2015 20
Prompt data analysis: Low-level filtering To reduce latter computing load, we want to identify candidates of GOOD event. Intensities in the mesh are recorded on event-synchronized DB. By selecting ROI, we can distinguish GOOD event from bad event. Region of Interest (ROI) Analyze 13,000 event BAD event BAD? BAD? Candidate of GOOD event 2015-04-13 CHEP2015 21
Prompt data analysis Prompt Data Analysis (on-line / run-by-run) Front-end section Data transfer and Accumulating section Data analysis section on-flight low-level filtering Data assembly: convert raw images to HDF5 format background (dark frame) subtraction 1Gb Ethernet VME gain CPU calibration Long-term Storage sensors alignment calibration (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Detector 10Gb Ethernet K computer (10 PF @Kobe) Data-handling Servers 14 TFLOPS x86 PC Cluster High-speed Cache Storages 2015-04-13 CHEP2015 22
Full data analysis Front-end section Data transfer and Accumulating section Data analysis section VME CPU 1Gb Ethernet Long-term Storage (Disk 1PB + Tape 7PB) Camera Link User Terminal Event-synchronized Database System 90 TFLOPS Supercomputer Full Data Analysis (off-line) data selection using low-level filtering classification averaging 10Gb Ethernet Detector alignment reconstruction of 3-D image K computer (10 PF @Kobe) Data-handling Servers 14 TFLOPS x86 PC Cluster High-speed Cache Storages 2015-04-13 CHEP2015 23
Full data analysis scheme Data selection Classification Averaging Alignment in reciprocal space Electron Density Atomic Structure 10 M frame,100tb 1 M frame,10tb 4 weeks by 10TFLOPS Big data processing is indispensable for 3D analysis 10 6 frames for analysis 10 7 frames for data taking (including junk). 4 days data taking at 60Hz ~ 100TB storage size. Estimated CPU time for phase retrieval to get the atomic structure 4 weeks by 10 TFLOPS PC cluster. 3 days by 100 TFLOPS supercomputer. -> Experimentalist requires more computing power. DAQ Storage PC cluster FX10 Supercomputer + K SACLA on-site 90TFLOPS 14TFLOPS 10PFLOPS Off-site, Kobe 2015-04-13 CHEP2015 24
転 送 速 度 MB/S Data analysis plan using supercomputer K computer We just started feasibility study to use K computer for not only full data analysis, but also prompt data analysis. SACLA-K synergy 1. We carried out a preliminary datatransfer and job submit from SACLA to K computer. 2. We achieved 6.4Gbps bandwidth, which satisfy the experimental data rate. High Performance Computing Infrastructure in Japan Combines calculation/storage of 10 institutes K belongs to HPCI Kysyu U. K-computer Osaka U. HPCI system U. of Tsukuba Kyoto U. Hokkaido U. Nagoya U. Japan Tohoku U. U. of Tokyo Tokyo Institute of Technology 900 800 700 600 A: gsiscp B: gfpcp C: gfspcp(tuned) C Hyogo pref. 500 400 300 200 A B 100 para. 800MB/s =6.4Gbps 100 0 0 20 40 並 60 列 度 80 100 120 Degree of parallel (measurement supported by AICS/RIKEN) GOJ promotes SACLA-K synergy project. SPARC64 VIIIfx, 8core (128GFLOPS/node) 2015-04-13 CHEP2015 25
Summary SACLA provides new opportunity to study complex targets using XFEL, especially, reveal 3-D structure of proteins. We developed DAQ and Analysis system for SACLA to satisfy experimental requirements: 6 Gbps data rate 120 TBytes data storage per protein sample short experimental period (a few days cycle) high computing power 10 TFLOPS for run-by-run prompt analysis 100 TFLOPS for full data analysis... and more computing power! We also started feasibility study of joint analysis using K computer, which is 10 PF supercomputer at Kobe. 2015-04-13 CHEP2015 26
backup 2015-04-13 CHEP2015 27