CRISP WP18. Requirements for Data Recording to Storage Media. CRISP Milestone 3. CRISP_MS3.doc

Transcription

1 CRISP WP18 Requirements for Data Recording to Storage Media CRISP Milestone 3 Document identifier: CRISP_MS3.doc Date: 30 June 2011 D. Boukhelef, A. Goetz, N. Ménard, R. Mudingay, B. Nikolic, J-F. Perrin, Authors S. Skelboe, F. Schluenzen, J. Szuba, H-J. Weyer, K. Wrona Abstract: Requirements for recording data to storage media are described for each participating Research Infrastructure. An analysis of the requirements is presented. Synergies identified are outlined. CRISP Cluster of Research Infrastructures for Synergies in Physics - 1 -

2 Introduction Rapid developments and increasingly complex experimental techniques in many scientific domains, as well as the use of highly advanced instruments and detectors, result in extremely high data rates exceeding tens of GB/s. Identification of costeffective recording of data to storage systems and archives becomes an increasingly complex and challenging task. In many cases, data originate at multiple sources and must be merged in order to provide a consistent set of information suitable for further processing. Ensuring the integrity of data while it is transferred, stored, and archived becomes difficult when dealing with multiple data streams, very high data rates and large accumulated data volumes. Applied method of gathering the requirements The requirements described in this document were gathered via regular phone conference discussions between the participating institutes, experts discussions within the facilities, analysis of the DM&IT questionnaire and detailed discussion on the 1 st annual CRISP meeting (23-25 April 2012) where WP18 have organized two dedicated sessions. The first session consisted of presentations from the six participating projects ESS, ESRF-Upgrade, EuroFEL, European XFEL, ILL2020, and SKA. Each presentation started with an introduction to the research infrastructure allowing all participants to learn more about general projects goals, scientific programme and time schedule. The main purpose of these presentations was to explain the requirements and challenges concerning data recording and data access. The material presented consisted of a description of data sources, expected data rates, anticipated data flow diagrams, ideas for online data processing, data protection requirements and concepts for data archiving. Each presentation was followed by a short discussion. In the second session these requirements were summarized and thoroughly discussed by focusing on the identification of common areas of interest and on developing a plan for further work. Requirements from Research Infrastructures ESRF Upgrade The ESRF is a third-generation synchrotron radiation source with more than 40 experiments running simultaneously. The ESRF has been operating in user mode for almost 20 years. The data volume produced has continued to grow exponentially. Figure 1 summarises the usage of the central storage disk at ESRF

3 Figure 1 Usage [TB] of the central disk storage space at ESRF. The high-data-transfer rate needed in the short and medium term can be summarised as follows: Data from multiple detectors (up to 3 each of up to 16 megapixels, 2 to 4 bytes per pixel) on multiple beamlines (up to 10) producing data at 200 MB/s (now) and 5X more in the future (2 to 3 years) with sustained peak performance of minutes to hours. This translates to a maximum data rate of 21.7 TB/hour (6GB/s) in the worst case, i.e. if 10 beamlines take data at full speed, each using 3 detectors simultaneously. In 3 years time, as experience shows, this will have to be multiplied by a factor of 5, i.e. resulting in 100 TB/ hour (30GB/s) in the worst case. In reality not all beamlines use 3 detectors simultaneously and the average data rate is approximately a factor of 3 less (1 detector taking data at 10 beamlines simultaneously) which implies: 1. Peak write/read rate = 7 TB/h or 2GB/s (2012) and 33 TB/h or 10GB/s (2015). These are sustained peak values. In reality the cycle time of experiments is much longer which means the average is less. If we assume a reasonable factor of 10 less, then the following average values are estimated: Average write/read rate = 700 GB/h or 0.2GB/s (2012) and 3 TB/h or 1GB/s (2015) The above rates are the sum for all experiments at the ESRF. In reality we are dealing with N experiments of very different data rates. One experiment can have a high sustained data rate while another experiment produces close to nothing. This - 3 -

4 means the proposed solution has to provide point-to-point performance which can be very high. The way the synchrotron produces data excludes spreading of the data load over all experimental stations. A very important issue is to be able to analyse (read) the data as soon as possible after it was taken (written). This means that read performance has to be as good as the write performance. The following specific needs have been identified: dedicated buffer for caching data from detector for fast online data processing dedicated buffering capacity of up to 2 days (i.e. weekend) solution for exporting user's raw and processed data automatically online data processing buffer must be big enough to hold a full experiments link from online data processing PC to central storage for writing or reading results Linux (Debian/RHEL) and Windows (7) detector PC's NFS (V4+V3) and CIFS list files < 3s multiple 10 Gb/s Ethernet links per beamline mount user storage on online data processing PC automatic export of analysed data to user's export medium read (for data analysis) write speed e.g. read performance for tomography 350 MB/s on average for 24 hours The main needs identified based on the above are: Dedicated buffer for guaranteeing data rates from detectors Online data analysis PC with its own dedicated buffer Automatic export of the raw and processed data to an export medium accessible for users so that they can take home the data as soon as the experiment is over. The dedicated buffer needs to be synchronised with the central storage for backup. It can be physically located wherever it is most convenient e.g. in the data centre. It is important that it is dedicated to a particular experiment and does not have to be shared

5 Figure 2 Sketch of the data flow at ESRF ESS ESS is planned to start generating neutrons with 7 instruments in operation in 2019 and fully configured with 22 instruments in It is going to be a pulsed neutron source with 14 Hz pulse repetition rate and 2.86 ms pulse length. With a duty cycle of 4% and average neutron intensity similar to ILL, the neutron pulses are going to be very intense. Whether these intense pulses result in similar bursts of data depends on the design of the instruments. The following high-level requirements for data recording to storage media have been identified: Data collected at ESS will be metadata and neutron data or image data. All data will be time stamped with the least significant 32 bits of the global 64 bit clock. The most significant 32 bits will be stored whenever necessary. Each neutron detected will be recorded in event mode with detector location and time stamp, 32 bit for each part. Images will be time stamped with information necessary to characterize energy spectrum. A non-exhaustive list of metadata to be stored and maintained includes: Proton-pulse data, moderator temperature, neutron flux, chopper settings, measured speed and phase, instrument settings, sample position, sample environment including temperature, pressure, magnetic field, mechanical strain. The neutron data rate from an instrument may be up to 400 MB/s. The final choice of instruments has not yet been made, and therefore this number is rather uncertain

6 This also applies to the maximum file size and the total amount of generated data listed below. Data files for an experiment are expected not to exceed 5 TB. During the experiment data is collected in a temporary file on a local computer dedicated to the instrument. Either the experiment is considered to be failed and the data is deleted, or the experiment is approved and the raw data is committed and transferred to permanent storage. Committed data is read only. The collection of data must take place without interrupting or otherwise interfering with experiment. The collection of data must permit streaming data analysis in order to provide the user with information of the progress of the experiment. The total amount of data collected per year from 22 ESS instruments is expected not to exceed 5 PB. The permanent storage system should be hierarchical and hold recent data and data being analysed on disk. Less frequently used data may be rolled back on tape until it is possibly used again. Data is expected to be stored for ever. EuroFEL The EuroFEL project is a joint effort of 7 partners involved in the construction or the operation of a Free Electron Laser facility. Most of the FELs are still under construction or in the commissioning phase. Precise figures for data requirements from FELs are hence difficult to project. However, most of the EuroFEL members also operate a synchrotron light source, and experiments performed at LCLS can provide a fair account of current and future data requirements. Requirements derived from the DESY facilities serve as an example of EuroFEL requirements and are summarized below. DESY currently operates two synchrotron-light sources (DORIS, Petra III) and a VUV FEL, more than 50 instruments in total. DESY is also a stakeholder of the European XFEL GmbH. However, requirements from European XFEL are not taken into account here, though they fully apply as well. PSI has a research environment similar to DESY. It is running a synchrotron (SLS) and the free-electron-laser project SwissFEL is under development. Thus, the conclusions from DESY in this paper can be used for PSI as well. The DESY-CFEL group is a quite demanding user group running experiments at various synchrotrons in Europe and the US, and in particular at the X-ray FEL LCLS. The experience with LCLS might well give an impression what kind of data rates and volumes are expected from fully operational X-ray FELs. During the last two years, DESY-CFEL did a small number of experiments at LCLS, each covering typically 2-4 weeks of beamtime. The accumulated amount of data created, transferred to and archived at DESY is shown in Figure

7 Figure 3 Data taken by DESY-CFEL at LCLS during the last two years The sustained data rate is approximately TB per week, the total volume 700TB. For the synchrotron-light sources, the PNI-HDRI high date rate project (a joint project of German HGF RIs) made estimates and projections for current and future data rates and volumes for selected experimental techniques. The estimates are summarized in Table 2. These estimates have been made before Petra III beamlines became fully operational. Meanwhile the in-situ imaging beamline P02 is running a Perkin Elmer Detector with 15 frames (16MB/frame) per second. Peak rates are hence at ~240MB/s and sustained averages of ~200MB/s for several days to weeks. The protein-crystallography beamline at Petra III (P11) uses a Pilatus 6M detector, which can operate at a frame rate of 25Hz. The peak data rates are around 350MB/s, the average data rates, depending on the mode of operation, at MB/s. The estimates originally made for Petra III were actually not far off, and the on-going detector developments will further increase the data rates within the coming years. Currently DESY plans for a storage infrastructure with 1.6PB per year for Photon Science research data at DESY

8 Table 1Requirements in terms of data rates and processing time for various applications at synchrotron sources gathered by PNI-HDRI project Data streams from multiple detectors need to be aggregated and transferred. The peak data rates are estimated to reach up to 18TB/h (5GB/s) and sustained average data rates of up to 4TB/h (1GB/s). For some FEL-experiments peak and sustained rates could reach up to 30TB/h (8GB/s) for a single experiment. Summing up the data rates from instruments at Petra III (not including FEL data): Sustained peak write/read rate = 1-10TB/h or 0.3-3GB/s (2012) and 10-50TB/h or 3-15GB/s (2015). Number of files created can exceed 10 5 /s (2012) and 10 8 /s (2015) These are sustained peak values. The sustained average rates are certainly lower at some instruments, where some modifications of the equipment between experiments is required, or for beamlines where sample handling requires lengthy manual interventions. Some beamlines can however operate without significant interruption (e.g. tomography or crystallography equipped with automatic sample changers and pre-characterized samples). Hence estimates for average rates are covering a wider range: - 8 -

9 Average write/read rate = TB/h or GB/s (2012) and 1-25TB/h or 0.3-8GB/s (2015) Naturally, data rates heavily depend on experimental technique and instrument. However, one important aspect is the concurrency of the data streams. The number of beamlines at DESY is currently about 50. Each of the beamlines produces an independent data stream, peak data rates can occur simultaneously, but don t have to. Balancing data loads over beamlines is not an option and data streams from different experiments should not interfere at all. An important issue is to be able to analyse i.e. read the data or at least samples of the data as soon as possible after being taken i.e. written. This means that read performance has to be as good as write performance. At DESY the dcache system is used as storage/archive backend, and wherever feasible also for online processing. At SLS a GPFS-based system is used for data storage. The choice of system has a certain impact on the requirements in terms of data aggregation and supported protocols, but should not affect the basic requirements, since the data flow model is very similar regardless of the system in use. A typical data flow is shown in Figure 4.The requirements are also very similar to those given by ESRF: Dedicated buffer to keep up with data rates from detector o data loss is not acceptable o permit fast online data processing o enable image corrections, trigger and quality indicators o enable compression, conversion, data aggregation Dedicated buffering capacity to cover at least 2-3 days o experiments should not be hampered by data export Support for Linux and Windows (7) detector PC's o Windows e.g. PCO, Perkin Elmer, PSI, Roper Scientific o Linux e.g. Pilatus, Maxipix, MAR, LCX Online data processing buffer o Large enough to cover a significant number of experiments. Processing time can exceed the time to create the data by orders of magnitude. o In some cases, support for parallel/cluster FS and MPI I/O preferable o No interference between online processing and the experiment. o Permit initial real time analysis for quality assessments - 9 -

10 Link from online data processing PC to central storage to write or read results o Decouple data transfer to storage from experiment. o Data should be available for offline analysis essentially as soon as an experiment has been terminated. Rapid transfer of data downstream (dcache) without interference with experiment or online analysis Support for ACLs & several protocols like NFS V4.1 and V3, CIFS, webdav o Data protection required at all stages. Speed: o Read (for data analysis) write speed 500 MB/s for several days from a single experiment. Export of analysed data to user's export medium o Support for various transfer protocols required. o Replication service might become beneficial Automatic registration of raw and analysed data in data catalogue Scalability: o The number of instruments/beamlines is rapidly growing o The next generation of x-ray detectors, e.g. Pilatus Eiger, can operate at frame rates of up to 22MHz per array. The Eiger detector will be composed of several arrays. Each array will produce up to 5GB/s. Even worse, it might produce several million files per second. Speed of meta-data operations on the filesystem might become an issue. Reliability/availability: o Beamtime is precious and costly o Data are irreproducible in some cases o Data integrity needs to be guaranteed at all stages of the data chain Cost efficiency o The data volumes to be kept online are rapidly increasing, due to the increasing size of a single dataset, the increasing online compute capabilities and increasing complexity (i.e. time to analyse a dataset)

11 Figure 4 Example for the data flow at a Petra III beamline European XFEL The requirements for high speed data recording at European XFEL are driven by the parameters of the photon beams, characteristics of detectors used and experiments operation modes. The schematic view of the photon beam lines are presented on Figure 5. According to the design, 3 concurrent experiments may be performed. The time profile of the photon beam is depicted on Figure 6. The train rate is 10Hz, meaning 100ms time distance between train heads and the train length is 600 µs. The maximal bunch rate within the train is 4.5MHz. Other bunch rates can be produced by removing or not creating bunches. The 2D pixel detectors place the largest demands on data recording. Their main characteristics are: 1 Mega pixel detector with pixel data size of 2B, resulting in 2MB frame size. Data are read from the detector through a custom hardware (train builder) with the maximum rate of 512 frames per train. Data need to be sent through 16 x 10GE links. This gives the total 10GB/s readout rate. Development of larger detectors, 2k x 2k = 4M pixels, and increase of numbers of frames per train to 1024 could be expected

12 Figure 5 Planned photon beamlines at European XFEL Figure 6 Time structure of the X-ray beam at European XFEL The most demanding instrument in terms of data recording - SPB - in its initial configuration, will consist of a number of 4.5 MHz enabled detectors: 1024 x 1024 pixel 2D camera for imaging Potentially, a smaller, possibly 256 x 256 pixel detector, wavefront monitor a single etof digitizer (10 GS/s with 10-bit resolution) a single-channel APD type fluorescent detector In addition, a number of general beam diagnostic devices will supply data streams which need to be correlated with instrument detectors. Depending on the operational mode the data volume accumulated per day may vary from several TB up to 400TB. There is a clear need to identify and reject as soon as possible data which are not useful for further analysis. These may be data from bunches where the FEL pulse has not interacted with a sample or the events were

13 not clean enough. Due to limitations in sample delivery methods for some type of experiments the fraction of good events may be of the order of few percent only. Although future developments may improve delivery mechanisms resulting in higher hit rate, the technique of bad data rejection must be planned and incorporated directly into the data acquisition chain. Rearrangement of internal detector data, building of complete data frames, data rejection, calibration, data formatting, compression, background discrimination, and consistency checks will require significant processing capabilities to be available within the data recording chain. The disk storage system must be capable of saving data from the experiments at both online and offline stages. The online storage should allow for immediate data access in order to be able to assess its quality and perform data processing in a manageable manner. Online storage also serves as a local buffer if the connection to the offline storage located in computer centre is disrupted. The estimated capacity of the online storage buffer is 0.5 PB per experiment. The offline storage should serve as a source of data for reconstruction and user analysis, giving free or semimanaged access on the much longer time scale. Data collected from experiments must be kept on a storage system as long as it is required by analysis or until data is exported to user home institute. If the analysis is conducted on site, the possibility of storing and accessing temporary data created during analysis and the final analysis results must be provided to users. The access protocols preferably should be standardized (e.g. NFS4.1). The final stage of data recording is an archive, i.e. a secure and long term data storage system. Restoring data from an archive should be done in a managed way. The implementation of archive system considered initially is based on tape media and dcache. Data reconstruction and analysis will require existence of CPU and GPU based computing clusters with optimized access paths to offline storage. Local node storage caches as well as cluster file system for fast data access are needed. The requirements on data recording system can be summarized as follows: The data acquisition system must be able to send data through multiple 10GE links using UDP protocol. The aggregated speed of formatting and writing data to files must be sufficient to sustain design acquisition rate (10GB/s per detector) A possibility must exist to reject single or multiple records based on information obtained from veto system. The format must be self-describing, encoded in platform independent way, based on software tools recognized by scientific communities - the initial candidate is HDF

14 Internal compression of single records as well as collection of records should be possible. Data from multiple sources must be correlated using train and bunch numbers Providing almost real time access to data for experiment specific data evaluation is required Control of the file size is needed. Files will contain multiple data records and images. Multiple trains per file must be possible. Raw data files will be immutable. SPIRAL2 Considering the complexity of the future detectors for SPIRAL2, most of them will have a dedicated electronics. Some of them will be located at GANIL, but most of them will be used in different laboratories. So it is important to consider the concept of DAQ subsystems which can be interconnected when detectors are associated together. Most of the new detectors will provide high data flows from different branches which will need online event building to merge the branches and filtering to reduce the amount of data to be stored. For some of the detectors, an evaluation of data rates and of amount of data to be stored is given in Table 2. Data bandwidth in MB/s Data generated in TB/day Comment Raw Filtered Raw Filtered AGATA Evaluation for GANIL phase with 15 Triple Clusters(2014) EXOGAM2 30 to 90 3 to 8 Depends on scenarios NEDA 15 to to 5.5 Depends on embedded compression algorithms and scenarios ACTAR S Table 2 Provisional data bandwidth for SPIRAL2 detectors

15 Considering the fact that some of these detectors can be coupled, the data bandwidth to be considered is of the order of some hundreds MB/s. One needs also to take into account that there can be 2 experiments at the same time. Depending on the collaboration, GANIL/SPIRAL2 will have to provide all or part of the network infrastructure and storage. For instance, the AGATA collaboration will provide its own data storage system, but ask for a data bandwidth to backup to an external computer centre. The requirements for data storage at GANIL/SPIRAL2 for the near future are the following: Data input: 2x300 MB/s (5Gbits/s effective, 10Gbits/s to be considered). The local network has to be updated to manage this bandwidth in experimental areas. Data output to be defined to enable quasi online data analysis on multiple clients with 1GE network interfaces. The local network has to be updated to manage this bandwidth in experimental areas and in the main building. Several hundreds of TB extensible storage in a highly available architecture (24x7) to store experimental data of the current campaign. Possibility to send data to a data centre like CC-IN2P3 (several TB/day) or to store experimental data during a year, at least, locally for analysis Experimental data have to be shared on heterogeneous clients: linux, windows and mac os x with a standard protocol such as NFS. Users identification for both local and remote access to data; rights for users and groups have to be managed with a standard protocol such as LDAP. Depending on future decisions, it could be necessary to have: a computing farm dedicated to the reduction of data flow, a computing farm for data analysis and a storage and backup infrastructure for a complete year (more than 500 TB) ILL2020 The Institute Laue - Langevin operates one of the most intense neutron sources in the world, feeding intense beams of neutrons to a suite of 40 high-performance instruments that are constantly upgraded. An ambitious modernization program was launched in 2000, through the design of new neutron infrastructure and the introduction of new instruments and instrument upgrades. The first phase resulted in 17-fold gains in performance. The second phase began in 2008 and comprises the building of 5 new instruments, the upgrade of 4 others, and the installation of 3 new neutron guides

16 Gb Standard data workflow During a typical data flow of an experiment, acquired data is stored on the local buffer also used as the Instrument control computer. This permits easy access to data for analysis and calibration. Data is also sent remotely to the archive system. Therefore access to experimental data can be gained either by experimenters through the local instrument buffer or on the central archive. Volume of data Due to continuous detector improvements there is regular but slow increase in the volume of experimental data. Since 2011, some instruments began to generate a greater volume of data which is expected to continue to grow towards 30TB at the end of 2012 (40 times the usual volume per cycle). The IT service has to take this into consideration, in terms of storage but also in terms of infrastructure (network, backup, workstation capacities) and general scientific workflow. We are leaving the world of 1-3 TB per year towards a higher output of data. Our projections are not perfectly precise at the time of writing this paper but what is sure is this evolution that will seriously impact the IT infrastructure. Gb / Cycle ,4094,85,35,74,6147,316149, Cycles 1973 / 1 à Gb / cycle Expon. (Gb / cycle) Linear (Gb / cycle) Figure 7 Evolution of data volume at ILL Introduction of the ILL data policy In December 2011 the ILL introduced a Data Policy in order to increase the scientific value of the data by opening it up to a wider community for further analysis and

17 fostering new collaborations between scientific groups. A necessary embargo period was introduced in order to provide time for the users to finalize and publish their work. Security as the main challenge The recent introduction of the ILL data policy and the different projects towards a better annotation of experimental data raised the need for security and regulated access control. This implied the introduction of strict Access Control Lists (ACL) on the different data file storage units. Those ACLs should be automatically derived from the proposal database taking into consideration the number of users and taking part in experiments. This also implies provisioning of different secure protocols to meet the heterogeneous operating system used by the scientists and imposed by workflows. Amongst the different protocols (rsync, CIFS, NFS) a special focus will be put on NFS. We need to ensure that it is the user who is authenticated and not only the workstation. This has not been the case with previous version of the NFS protocol. The latest version of NFS (release 4) introduces a possibility to integrate with Kerberos service as a means to authenticate users and protect access to data. Inside the work package we intend to study the feasibility and consequences related to implementing Kerberos and NFSv4 infrastructure as a viable means of securing the access to experimental data files. SKA The Square Killometre Array (SKA) is the next-generation radio telescope to be built by a large international consortium (including currently the UK and Netherlands from the EU) and currently in the design stages. The SKA will have about five square kilometres of collecting area and very advanced computational facilities giving it a sensitivity of about 100 times better than best current telescopes and a sky survey speed that is a million times greater than current facilities. These improvements compared to the current capabilities will allow the SKA to observe high energy processes and atomic hydrogen in the universe out to the epoch of re-ionisation giving us a new view into the fundamental physics, astronomy and cosmology

18 Figure 8 Illustration of data flow for the SKA1 experiment "Epoch of Reionisation", with estimated dataflow rates (which are below the maximum estimated rates for other experiments). The computational load of processing the data received by the SKA receptors is likely to be a limiting factor in the scientific capabilities of the telescope and therefore the scientific data processing system is one of the key R&D sub-projects within the on-going SKA design. One of the key challenges is the combination of very high data rates and iterative nature of the calibration algorithm of the telescope. This means that a fully streaming architecture is not possible and instead data must be stored in an UV data store for the duration of single observation (typically about five hours). This UV data store must support very fast data recording and retrieval and is the main part of the SKA which will be the beneficiary of CRISP WP18. The requirements listed below are current baseline requirements derived from the analysis of some SKA1 experiments. It is likely that these will evolve somewhat over time as the design process continues but are likely to be representative: Write throughput: Maximum expected write data rate of 330 GB/s. Read throughput: Maximum expected read data rate of 1650 GB/s. Storage duration: Expected duration of storage of data is 5 hours. Storage capacity: Required capacity is about 6 PB. Hardware interfaces: Industry standard hardware interfaces suitable for use with supercomputers

19 Data format: The data format is most likely to be a custom binary format although HDF5 or some derived technology might be used. Security: No security/quota/user permission required as both ends directly controlled by SKA software. Real-time: Soft real-time write operation as buffering in correlator is likely to be limited. Consistent, predictable performance for both reads and writes are required. Reliability: Loss of binary data is not a problem as long as it is correctly flagged and does not cause hold ups to processing. Deployment timeframe: Full production is expected at the beginning of Design reviews are scheduled in For the requirements above it was assumed that the calibration will require five full cycles. The maximum data rate estimates from SKA1 high level description document by Dewdney et al. are taken. Besides requirements above which are easy to enumerate the SKA UV data store will have one idiosyncratic requirement which arises because there is a data reordering step required between the correlator and the gridder component, i.e., where the UV data store is in the architecture. It is possible that significant savings may be achieved by combining the data re-ordering step with the process of storing the data in UV store. For this to be possible it is necessary that the UV data store has a very high degree of flexibility and control of how the data are physically located after storage. Additionally, the pattern of data writes and reads will be extremely predictable in the case of the UV store and there should be mechanisms in the technology that is adopted to allow for making use of this predictability for maximum performance

20 Analysis of Requirements Interpretation of the requirements outlined in this document must take into account that the facilities are at different planning or construction stages. Some of the projects concerned are upgrades of existing facilities where certain solutions are already in place and new developments have to be built upon the existing infrastructure. The facilities will also serve different scientific communities, each with its own history of analysing data. Certain analysis methods are well established within communities and the expectations of users and their experience in terms of handling large data volumes are on different levels. Therefore, it is difficult to compile a consistent set of requirements. On the other hand the discussions up to now suggest that, although anticipated solutions for handling data may be different, many common issues can be identified. This heterogeneity may in fact fund the bases for close cooperation. Those facilities which are at the early planning phase will learn what concrete issues and experiences with handling large amount of data are considered the most challenging elsewhere, and the more advanced projects may improve the technical realization of the solution based on the experience of the partners. Data rates expected to be recorded at facilities can be grouped in three categories: relatively small data rates at ESS, ILL and SPIRAL2 (neutrons and ions oriented physics), high data rates expected at ESRF, EuroFEL and European XFEL (synchrotrons and FELs) and an order of magnitude higher data rates at SKA (astrophysics) but expected few years later. Summary of the required data throughput is listed in the Table 3. SKA European XFEL EuroFEL ESRF Spiral2 ESS ILL2020 When > Peak data rates per experiment [GB/s] (50) ,56 0,4 small Table 3 Summary of peak data rates at different facilities In all cases, data needs to be transferred from detectors to the storage systems. The preferable network infrastructure is generally based on the standard 10Gigabits Ethernet (10GE). If the required bandwidth exceeds the capacity of a single 10GE link, multiple links need to be used. Typically the protocol which is required to transfer data from detector is based on UDP. UDP is used because the protocol overhead is low and allows for high speed data transfer. However, this requires complex tuning of the computer systems to guarantee acceptable level of packet loss rate and defining recovery policy to handle lost packets. Assuring long term stability and sustainability of data transfer gets more difficult in multi links setup and requires careful design of the network infrastructure

21 Data acquired by the detectors needs to be stored in dedicated buffers (disk storage). These buffers must be designed in order to cope with the maximum data rates. The highest priority is put on reliable data recording. The capacity of buffers is typically defined based of the requirement to accumulate data for 2-3 days of operation. In some cases like SKA much shorter time is required but the data rate is an order of magnitude higher. The important role of the buffers is to minimize the interference between writing and reading operations. Unpredictable access pattern to data can quickly lead to significant IO performance degradation. Appropriate partitioning of the system and technique for dealing with concurrent data accesses need to be established and tested. Before data is transferred to the offline storage and archive systems it is usually accessed for data monitoring quality, pre-processing or rejection of bad-quality data. The final architecture proposed will require a possibility to include data processing algorithms into the data acquisition chain. This may be initiated as soon as data is read from the detector, on the way to the buffer or right after it is stored. In some cases the procedure can be exactly defined and under full control of the facility. In these cases specialized software and applications running on dedicated computing clusters with optimized infrastructure and architecture, most suitable for the processing algorithms, are needed (i.e. at SKA). In cases where experiments are performed by external users visiting the facility for short time the high level of flexibility is necessary. Providing almost real-time access to data for experiment specific data quality evaluation using well established community tools is required. This can be realized by either limiting the access to a subset of data or by using analysis algorithms in a mode which is just sufficient to conclude if the data can be fully analysed at the later stage. This data pre-analysis step must not interfere with data recording chain in order to guarantee that all data are safely stored. There is a clear need to find appropriate model and technical solution to satisfy the requirement of flexibility in data access with the high data rates. Another common subject identified is the data aggregation for multiple streams from the same or different detectors. Data aggregation technique must take into account different frequencies of generating data depending on the source (i.e. pulse related data vs. slow control information) and different sizes (single scalar value vs. large image data). Merging all the streams to a single output channel may be very beneficial as it significantly simplifies the usage of the data at the later analysis stages. However, the high data rates may require storing multiple streams separately to improve writing and reading performance. In many cases data processing needs to be performed several times and the data rates are few times higher than for recording. The physical files organization and the data structure within files must be optimized rather for fast reading than for fast writing. Additionally, file sizes must be controlled to avoid extremes as both very small and very huge files are difficult to handle at the offline storage and in the archive

22 Usually data files need to be sent out from the local buffers to the shared storage system dedicated for offline analysis or even outside the facility if the available wide area network bandwidth is capable of handling the rate. The earlier mentioned concurrency issues related to the fast data evaluation apply also to the data export services from the local buffer. Data stored in the archive needs to be secured for long time. In all cases raw data is meant to be immutable and write once read many (WORM) policy applies. Therefore it is needed to assure that data will not be accidently modified or corrupted during the entire lifetime. Integrity check must detect any change in the file content. At the moment tapes are the most reliable long term storage media but accessing them requires special coordination in order to achieve the required performance. Data archiving strategies for different data types should also be defined to allow the best and the most cost-effective usage of resources without increasing the risk of data losses. Data protection is important for facilities which act as a service provider to users. Since users groups typically compete between each other this aspect must be taken into account already at the data acquisition step. On the other hand it is highly beneficial to share the data between scientists in the long term. The open access policies are also subject to other EU funded projects (e.g. Pan-DATA). In CRISP the work will concentrate on the technical realization of the data policies and requirements to protect data right at the beginning of the recording phase. Dedicated document will address these issues in more details. Identified synergies Based on the analysis of the requirements the following list of topics and related tasks has been identified as a guideline for further work within the WP18. Using 10GE network for high throughput data transfers o Optimal network design and hardware experience o Experience in using various protocols like UDP or TCP (data bandwidths, error handling, analysis of packets losses) Online data processing models including: o Data aggregation from multiple sources o Parallel data processing (concurrent access to memory) o Quality monitoring o Data rejection o Data compression o Data formatting o Fast data analysis

23 Writing data to storage o Local buffers for safety and efficiency o Separation between online and offline environment o Testing cluster file systems o Concurrent access to filesystem Data archiving o Archiving strategies o Consistency checks