CRISP WP18. Requirements for Data Recording to Storage Media. CRISP Milestone 3. CRISP_MS3.doc

Size: px
Start display at page:

Download "CRISP WP18. Requirements for Data Recording to Storage Media. CRISP Milestone 3. CRISP_MS3.doc"

Transcription

1 CRISP WP18 Requirements for Data Recording to Storage Media CRISP Milestone 3 Document identifier: CRISP_MS3.doc Date: 30 June 2011 D. Boukhelef, A. Goetz, N. Ménard, R. Mudingay, B. Nikolic, J-F. Perrin, Authors S. Skelboe, F. Schluenzen, J. Szuba, H-J. Weyer, K. Wrona Abstract: Requirements for recording data to storage media are described for each participating Research Infrastructure. An analysis of the requirements is presented. Synergies identified are outlined. CRISP Cluster of Research Infrastructures for Synergies in Physics - 1 -

2 Introduction Rapid developments and increasingly complex experimental techniques in many scientific domains, as well as the use of highly advanced instruments and detectors, result in extremely high data rates exceeding tens of GB/s. Identification of costeffective recording of data to storage systems and archives becomes an increasingly complex and challenging task. In many cases, data originate at multiple sources and must be merged in order to provide a consistent set of information suitable for further processing. Ensuring the integrity of data while it is transferred, stored, and archived becomes difficult when dealing with multiple data streams, very high data rates and large accumulated data volumes. Applied method of gathering the requirements The requirements described in this document were gathered via regular phone conference discussions between the participating institutes, experts discussions within the facilities, analysis of the DM&IT questionnaire and detailed discussion on the 1 st annual CRISP meeting (23-25 April 2012) where WP18 have organized two dedicated sessions. The first session consisted of presentations from the six participating projects ESS, ESRF-Upgrade, EuroFEL, European XFEL, ILL2020, and SKA. Each presentation started with an introduction to the research infrastructure allowing all participants to learn more about general projects goals, scientific programme and time schedule. The main purpose of these presentations was to explain the requirements and challenges concerning data recording and data access. The material presented consisted of a description of data sources, expected data rates, anticipated data flow diagrams, ideas for online data processing, data protection requirements and concepts for data archiving. Each presentation was followed by a short discussion. In the second session these requirements were summarized and thoroughly discussed by focusing on the identification of common areas of interest and on developing a plan for further work. Requirements from Research Infrastructures ESRF Upgrade The ESRF is a third-generation synchrotron radiation source with more than 40 experiments running simultaneously. The ESRF has been operating in user mode for almost 20 years. The data volume produced has continued to grow exponentially. Figure 1 summarises the usage of the central storage disk at ESRF

3 Figure 1 Usage [TB] of the central disk storage space at ESRF. The high-data-transfer rate needed in the short and medium term can be summarised as follows: Data from multiple detectors (up to 3 each of up to 16 megapixels, 2 to 4 bytes per pixel) on multiple beamlines (up to 10) producing data at 200 MB/s (now) and 5X more in the future (2 to 3 years) with sustained peak performance of minutes to hours. This translates to a maximum data rate of 21.7 TB/hour (6GB/s) in the worst case, i.e. if 10 beamlines take data at full speed, each using 3 detectors simultaneously. In 3 years time, as experience shows, this will have to be multiplied by a factor of 5, i.e. resulting in 100 TB/ hour (30GB/s) in the worst case. In reality not all beamlines use 3 detectors simultaneously and the average data rate is approximately a factor of 3 less (1 detector taking data at 10 beamlines simultaneously) which implies: 1. Peak write/read rate = 7 TB/h or 2GB/s (2012) and 33 TB/h or 10GB/s (2015). These are sustained peak values. In reality the cycle time of experiments is much longer which means the average is less. If we assume a reasonable factor of 10 less, then the following average values are estimated: Average write/read rate = 700 GB/h or 0.2GB/s (2012) and 3 TB/h or 1GB/s (2015) The above rates are the sum for all experiments at the ESRF. In reality we are dealing with N experiments of very different data rates. One experiment can have a high sustained data rate while another experiment produces close to nothing. This - 3 -

4 means the proposed solution has to provide point-to-point performance which can be very high. The way the synchrotron produces data excludes spreading of the data load over all experimental stations. A very important issue is to be able to analyse (read) the data as soon as possible after it was taken (written). This means that read performance has to be as good as the write performance. The following specific needs have been identified: dedicated buffer for caching data from detector for fast online data processing dedicated buffering capacity of up to 2 days (i.e. weekend) solution for exporting user's raw and processed data automatically online data processing buffer must be big enough to hold a full experiments link from online data processing PC to central storage for writing or reading results Linux (Debian/RHEL) and Windows (7) detector PC's NFS (V4+V3) and CIFS list files < 3s multiple 10 Gb/s Ethernet links per beamline mount user storage on online data processing PC automatic export of analysed data to user's export medium read (for data analysis) write speed e.g. read performance for tomography 350 MB/s on average for 24 hours The main needs identified based on the above are: Dedicated buffer for guaranteeing data rates from detectors Online data analysis PC with its own dedicated buffer Automatic export of the raw and processed data to an export medium accessible for users so that they can take home the data as soon as the experiment is over. The dedicated buffer needs to be synchronised with the central storage for backup. It can be physically located wherever it is most convenient e.g. in the data centre. It is important that it is dedicated to a particular experiment and does not have to be shared

5 Figure 2 Sketch of the data flow at ESRF ESS ESS is planned to start generating neutrons with 7 instruments in operation in 2019 and fully configured with 22 instruments in It is going to be a pulsed neutron source with 14 Hz pulse repetition rate and 2.86 ms pulse length. With a duty cycle of 4% and average neutron intensity similar to ILL, the neutron pulses are going to be very intense. Whether these intense pulses result in similar bursts of data depends on the design of the instruments. The following high-level requirements for data recording to storage media have been identified: Data collected at ESS will be metadata and neutron data or image data. All data will be time stamped with the least significant 32 bits of the global 64 bit clock. The most significant 32 bits will be stored whenever necessary. Each neutron detected will be recorded in event mode with detector location and time stamp, 32 bit for each part. Images will be time stamped with information necessary to characterize energy spectrum. A non-exhaustive list of metadata to be stored and maintained includes: Proton-pulse data, moderator temperature, neutron flux, chopper settings, measured speed and phase, instrument settings, sample position, sample environment including temperature, pressure, magnetic field, mechanical strain. The neutron data rate from an instrument may be up to 400 MB/s. The final choice of instruments has not yet been made, and therefore this number is rather uncertain

6 This also applies to the maximum file size and the total amount of generated data listed below. Data files for an experiment are expected not to exceed 5 TB. During the experiment data is collected in a temporary file on a local computer dedicated to the instrument. Either the experiment is considered to be failed and the data is deleted, or the experiment is approved and the raw data is committed and transferred to permanent storage. Committed data is read only. The collection of data must take place without interrupting or otherwise interfering with experiment. The collection of data must permit streaming data analysis in order to provide the user with information of the progress of the experiment. The total amount of data collected per year from 22 ESS instruments is expected not to exceed 5 PB. The permanent storage system should be hierarchical and hold recent data and data being analysed on disk. Less frequently used data may be rolled back on tape until it is possibly used again. Data is expected to be stored for ever. EuroFEL The EuroFEL project is a joint effort of 7 partners involved in the construction or the operation of a Free Electron Laser facility. Most of the FELs are still under construction or in the commissioning phase. Precise figures for data requirements from FELs are hence difficult to project. However, most of the EuroFEL members also operate a synchrotron light source, and experiments performed at LCLS can provide a fair account of current and future data requirements. Requirements derived from the DESY facilities serve as an example of EuroFEL requirements and are summarized below. DESY currently operates two synchrotron-light sources (DORIS, Petra III) and a VUV FEL, more than 50 instruments in total. DESY is also a stakeholder of the European XFEL GmbH. However, requirements from European XFEL are not taken into account here, though they fully apply as well. PSI has a research environment similar to DESY. It is running a synchrotron (SLS) and the free-electron-laser project SwissFEL is under development. Thus, the conclusions from DESY in this paper can be used for PSI as well. The DESY-CFEL group is a quite demanding user group running experiments at various synchrotrons in Europe and the US, and in particular at the X-ray FEL LCLS. The experience with LCLS might well give an impression what kind of data rates and volumes are expected from fully operational X-ray FELs. During the last two years, DESY-CFEL did a small number of experiments at LCLS, each covering typically 2-4 weeks of beamtime. The accumulated amount of data created, transferred to and archived at DESY is shown in Figure

7 Figure 3 Data taken by DESY-CFEL at LCLS during the last two years The sustained data rate is approximately TB per week, the total volume 700TB. For the synchrotron-light sources, the PNI-HDRI high date rate project (a joint project of German HGF RIs) made estimates and projections for current and future data rates and volumes for selected experimental techniques. The estimates are summarized in Table 2. These estimates have been made before Petra III beamlines became fully operational. Meanwhile the in-situ imaging beamline P02 is running a Perkin Elmer Detector with 15 frames (16MB/frame) per second. Peak rates are hence at ~240MB/s and sustained averages of ~200MB/s for several days to weeks. The protein-crystallography beamline at Petra III (P11) uses a Pilatus 6M detector, which can operate at a frame rate of 25Hz. The peak data rates are around 350MB/s, the average data rates, depending on the mode of operation, at MB/s. The estimates originally made for Petra III were actually not far off, and the on-going detector developments will further increase the data rates within the coming years. Currently DESY plans for a storage infrastructure with 1.6PB per year for Photon Science research data at DESY

8 Table 1Requirements in terms of data rates and processing time for various applications at synchrotron sources gathered by PNI-HDRI project Data streams from multiple detectors need to be aggregated and transferred. The peak data rates are estimated to reach up to 18TB/h (5GB/s) and sustained average data rates of up to 4TB/h (1GB/s). For some FEL-experiments peak and sustained rates could reach up to 30TB/h (8GB/s) for a single experiment. Summing up the data rates from instruments at Petra III (not including FEL data): Sustained peak write/read rate = 1-10TB/h or 0.3-3GB/s (2012) and 10-50TB/h or 3-15GB/s (2015). Number of files created can exceed 10 5 /s (2012) and 10 8 /s (2015) These are sustained peak values. The sustained average rates are certainly lower at some instruments, where some modifications of the equipment between experiments is required, or for beamlines where sample handling requires lengthy manual interventions. Some beamlines can however operate without significant interruption (e.g. tomography or crystallography equipped with automatic sample changers and pre-characterized samples). Hence estimates for average rates are covering a wider range: - 8 -

9 Average write/read rate = TB/h or GB/s (2012) and 1-25TB/h or 0.3-8GB/s (2015) Naturally, data rates heavily depend on experimental technique and instrument. However, one important aspect is the concurrency of the data streams. The number of beamlines at DESY is currently about 50. Each of the beamlines produces an independent data stream, peak data rates can occur simultaneously, but don t have to. Balancing data loads over beamlines is not an option and data streams from different experiments should not interfere at all. An important issue is to be able to analyse i.e. read the data or at least samples of the data as soon as possible after being taken i.e. written. This means that read performance has to be as good as write performance. At DESY the dcache system is used as storage/archive backend, and wherever feasible also for online processing. At SLS a GPFS-based system is used for data storage. The choice of system has a certain impact on the requirements in terms of data aggregation and supported protocols, but should not affect the basic requirements, since the data flow model is very similar regardless of the system in use. A typical data flow is shown in Figure 4.The requirements are also very similar to those given by ESRF: Dedicated buffer to keep up with data rates from detector o data loss is not acceptable o permit fast online data processing o enable image corrections, trigger and quality indicators o enable compression, conversion, data aggregation Dedicated buffering capacity to cover at least 2-3 days o experiments should not be hampered by data export Support for Linux and Windows (7) detector PC's o Windows e.g. PCO, Perkin Elmer, PSI, Roper Scientific o Linux e.g. Pilatus, Maxipix, MAR, LCX Online data processing buffer o Large enough to cover a significant number of experiments. Processing time can exceed the time to create the data by orders of magnitude. o In some cases, support for parallel/cluster FS and MPI I/O preferable o No interference between online processing and the experiment. o Permit initial real time analysis for quality assessments - 9 -

10 Link from online data processing PC to central storage to write or read results o Decouple data transfer to storage from experiment. o Data should be available for offline analysis essentially as soon as an experiment has been terminated. Rapid transfer of data downstream (dcache) without interference with experiment or online analysis Support for ACLs & several protocols like NFS V4.1 and V3, CIFS, webdav o Data protection required at all stages. Speed: o Read (for data analysis) write speed 500 MB/s for several days from a single experiment. Export of analysed data to user's export medium o Support for various transfer protocols required. o Replication service might become beneficial Automatic registration of raw and analysed data in data catalogue Scalability: o The number of instruments/beamlines is rapidly growing o The next generation of x-ray detectors, e.g. Pilatus Eiger, can operate at frame rates of up to 22MHz per array. The Eiger detector will be composed of several arrays. Each array will produce up to 5GB/s. Even worse, it might produce several million files per second. Speed of meta-data operations on the filesystem might become an issue. Reliability/availability: o Beamtime is precious and costly o Data are irreproducible in some cases o Data integrity needs to be guaranteed at all stages of the data chain Cost efficiency o The data volumes to be kept online are rapidly increasing, due to the increasing size of a single dataset, the increasing online compute capabilities and increasing complexity (i.e. time to analyse a dataset)

11 Figure 4 Example for the data flow at a Petra III beamline European XFEL The requirements for high speed data recording at European XFEL are driven by the parameters of the photon beams, characteristics of detectors used and experiments operation modes. The schematic view of the photon beam lines are presented on Figure 5. According to the design, 3 concurrent experiments may be performed. The time profile of the photon beam is depicted on Figure 6. The train rate is 10Hz, meaning 100ms time distance between train heads and the train length is 600 µs. The maximal bunch rate within the train is 4.5MHz. Other bunch rates can be produced by removing or not creating bunches. The 2D pixel detectors place the largest demands on data recording. Their main characteristics are: 1 Mega pixel detector with pixel data size of 2B, resulting in 2MB frame size. Data are read from the detector through a custom hardware (train builder) with the maximum rate of 512 frames per train. Data need to be sent through 16 x 10GE links. This gives the total 10GB/s readout rate. Development of larger detectors, 2k x 2k = 4M pixels, and increase of numbers of frames per train to 1024 could be expected

12 Figure 5 Planned photon beamlines at European XFEL Figure 6 Time structure of the X-ray beam at European XFEL The most demanding instrument in terms of data recording - SPB - in its initial configuration, will consist of a number of 4.5 MHz enabled detectors: 1024 x 1024 pixel 2D camera for imaging Potentially, a smaller, possibly 256 x 256 pixel detector, wavefront monitor a single etof digitizer (10 GS/s with 10-bit resolution) a single-channel APD type fluorescent detector In addition, a number of general beam diagnostic devices will supply data streams which need to be correlated with instrument detectors. Depending on the operational mode the data volume accumulated per day may vary from several TB up to 400TB. There is a clear need to identify and reject as soon as possible data which are not useful for further analysis. These may be data from bunches where the FEL pulse has not interacted with a sample or the events were

13 not clean enough. Due to limitations in sample delivery methods for some type of experiments the fraction of good events may be of the order of few percent only. Although future developments may improve delivery mechanisms resulting in higher hit rate, the technique of bad data rejection must be planned and incorporated directly into the data acquisition chain. Rearrangement of internal detector data, building of complete data frames, data rejection, calibration, data formatting, compression, background discrimination, and consistency checks will require significant processing capabilities to be available within the data recording chain. The disk storage system must be capable of saving data from the experiments at both online and offline stages. The online storage should allow for immediate data access in order to be able to assess its quality and perform data processing in a manageable manner. Online storage also serves as a local buffer if the connection to the offline storage located in computer centre is disrupted. The estimated capacity of the online storage buffer is 0.5 PB per experiment. The offline storage should serve as a source of data for reconstruction and user analysis, giving free or semimanaged access on the much longer time scale. Data collected from experiments must be kept on a storage system as long as it is required by analysis or until data is exported to user home institute. If the analysis is conducted on site, the possibility of storing and accessing temporary data created during analysis and the final analysis results must be provided to users. The access protocols preferably should be standardized (e.g. NFS4.1). The final stage of data recording is an archive, i.e. a secure and long term data storage system. Restoring data from an archive should be done in a managed way. The implementation of archive system considered initially is based on tape media and dcache. Data reconstruction and analysis will require existence of CPU and GPU based computing clusters with optimized access paths to offline storage. Local node storage caches as well as cluster file system for fast data access are needed. The requirements on data recording system can be summarized as follows: The data acquisition system must be able to send data through multiple 10GE links using UDP protocol. The aggregated speed of formatting and writing data to files must be sufficient to sustain design acquisition rate (10GB/s per detector) A possibility must exist to reject single or multiple records based on information obtained from veto system. The format must be self-describing, encoded in platform independent way, based on software tools recognized by scientific communities - the initial candidate is HDF

14 Internal compression of single records as well as collection of records should be possible. Data from multiple sources must be correlated using train and bunch numbers Providing almost real time access to data for experiment specific data evaluation is required Control of the file size is needed. Files will contain multiple data records and images. Multiple trains per file must be possible. Raw data files will be immutable. SPIRAL2 Considering the complexity of the future detectors for SPIRAL2, most of them will have a dedicated electronics. Some of them will be located at GANIL, but most of them will be used in different laboratories. So it is important to consider the concept of DAQ subsystems which can be interconnected when detectors are associated together. Most of the new detectors will provide high data flows from different branches which will need online event building to merge the branches and filtering to reduce the amount of data to be stored. For some of the detectors, an evaluation of data rates and of amount of data to be stored is given in Table 2. Data bandwidth in MB/s Data generated in TB/day Comment Raw Filtered Raw Filtered AGATA Evaluation for GANIL phase with 15 Triple Clusters(2014) EXOGAM2 30 to 90 3 to 8 Depends on scenarios NEDA 15 to to 5.5 Depends on embedded compression algorithms and scenarios ACTAR S Table 2 Provisional data bandwidth for SPIRAL2 detectors

15 Considering the fact that some of these detectors can be coupled, the data bandwidth to be considered is of the order of some hundreds MB/s. One needs also to take into account that there can be 2 experiments at the same time. Depending on the collaboration, GANIL/SPIRAL2 will have to provide all or part of the network infrastructure and storage. For instance, the AGATA collaboration will provide its own data storage system, but ask for a data bandwidth to backup to an external computer centre. The requirements for data storage at GANIL/SPIRAL2 for the near future are the following: Data input: 2x300 MB/s (5Gbits/s effective, 10Gbits/s to be considered). The local network has to be updated to manage this bandwidth in experimental areas. Data output to be defined to enable quasi online data analysis on multiple clients with 1GE network interfaces. The local network has to be updated to manage this bandwidth in experimental areas and in the main building. Several hundreds of TB extensible storage in a highly available architecture (24x7) to store experimental data of the current campaign. Possibility to send data to a data centre like CC-IN2P3 (several TB/day) or to store experimental data during a year, at least, locally for analysis Experimental data have to be shared on heterogeneous clients: linux, windows and mac os x with a standard protocol such as NFS. Users identification for both local and remote access to data; rights for users and groups have to be managed with a standard protocol such as LDAP. Depending on future decisions, it could be necessary to have: a computing farm dedicated to the reduction of data flow, a computing farm for data analysis and a storage and backup infrastructure for a complete year (more than 500 TB) ILL2020 The Institute Laue - Langevin operates one of the most intense neutron sources in the world, feeding intense beams of neutrons to a suite of 40 high-performance instruments that are constantly upgraded. An ambitious modernization program was launched in 2000, through the design of new neutron infrastructure and the introduction of new instruments and instrument upgrades. The first phase resulted in 17-fold gains in performance. The second phase began in 2008 and comprises the building of 5 new instruments, the upgrade of 4 others, and the installation of 3 new neutron guides

16 Gb Standard data workflow During a typical data flow of an experiment, acquired data is stored on the local buffer also used as the Instrument control computer. This permits easy access to data for analysis and calibration. Data is also sent remotely to the archive system. Therefore access to experimental data can be gained either by experimenters through the local instrument buffer or on the central archive. Volume of data Due to continuous detector improvements there is regular but slow increase in the volume of experimental data. Since 2011, some instruments began to generate a greater volume of data which is expected to continue to grow towards 30TB at the end of 2012 (40 times the usual volume per cycle). The IT service has to take this into consideration, in terms of storage but also in terms of infrastructure (network, backup, workstation capacities) and general scientific workflow. We are leaving the world of 1-3 TB per year towards a higher output of data. Our projections are not perfectly precise at the time of writing this paper but what is sure is this evolution that will seriously impact the IT infrastructure. Gb / Cycle ,4094,85,35,74,6147,316149, Cycles 1973 / 1 à Gb / cycle Expon. (Gb / cycle) Linear (Gb / cycle) Figure 7 Evolution of data volume at ILL Introduction of the ILL data policy In December 2011 the ILL introduced a Data Policy in order to increase the scientific value of the data by opening it up to a wider community for further analysis and

17 fostering new collaborations between scientific groups. A necessary embargo period was introduced in order to provide time for the users to finalize and publish their work. Security as the main challenge The recent introduction of the ILL data policy and the different projects towards a better annotation of experimental data raised the need for security and regulated access control. This implied the introduction of strict Access Control Lists (ACL) on the different data file storage units. Those ACLs should be automatically derived from the proposal database taking into consideration the number of users and taking part in experiments. This also implies provisioning of different secure protocols to meet the heterogeneous operating system used by the scientists and imposed by workflows. Amongst the different protocols (rsync, CIFS, NFS) a special focus will be put on NFS. We need to ensure that it is the user who is authenticated and not only the workstation. This has not been the case with previous version of the NFS protocol. The latest version of NFS (release 4) introduces a possibility to integrate with Kerberos service as a means to authenticate users and protect access to data. Inside the work package we intend to study the feasibility and consequences related to implementing Kerberos and NFSv4 infrastructure as a viable means of securing the access to experimental data files. SKA The Square Killometre Array (SKA) is the next-generation radio telescope to be built by a large international consortium (including currently the UK and Netherlands from the EU) and currently in the design stages. The SKA will have about five square kilometres of collecting area and very advanced computational facilities giving it a sensitivity of about 100 times better than best current telescopes and a sky survey speed that is a million times greater than current facilities. These improvements compared to the current capabilities will allow the SKA to observe high energy processes and atomic hydrogen in the universe out to the epoch of re-ionisation giving us a new view into the fundamental physics, astronomy and cosmology

18 Figure 8 Illustration of data flow for the SKA1 experiment "Epoch of Reionisation", with estimated dataflow rates (which are below the maximum estimated rates for other experiments). The computational load of processing the data received by the SKA receptors is likely to be a limiting factor in the scientific capabilities of the telescope and therefore the scientific data processing system is one of the key R&D sub-projects within the on-going SKA design. One of the key challenges is the combination of very high data rates and iterative nature of the calibration algorithm of the telescope. This means that a fully streaming architecture is not possible and instead data must be stored in an UV data store for the duration of single observation (typically about five hours). This UV data store must support very fast data recording and retrieval and is the main part of the SKA which will be the beneficiary of CRISP WP18. The requirements listed below are current baseline requirements derived from the analysis of some SKA1 experiments. It is likely that these will evolve somewhat over time as the design process continues but are likely to be representative: Write throughput: Maximum expected write data rate of 330 GB/s. Read throughput: Maximum expected read data rate of 1650 GB/s. Storage duration: Expected duration of storage of data is 5 hours. Storage capacity: Required capacity is about 6 PB. Hardware interfaces: Industry standard hardware interfaces suitable for use with supercomputers

19 Data format: The data format is most likely to be a custom binary format although HDF5 or some derived technology might be used. Security: No security/quota/user permission required as both ends directly controlled by SKA software. Real-time: Soft real-time write operation as buffering in correlator is likely to be limited. Consistent, predictable performance for both reads and writes are required. Reliability: Loss of binary data is not a problem as long as it is correctly flagged and does not cause hold ups to processing. Deployment timeframe: Full production is expected at the beginning of Design reviews are scheduled in For the requirements above it was assumed that the calibration will require five full cycles. The maximum data rate estimates from SKA1 high level description document by Dewdney et al. are taken. Besides requirements above which are easy to enumerate the SKA UV data store will have one idiosyncratic requirement which arises because there is a data reordering step required between the correlator and the gridder component, i.e., where the UV data store is in the architecture. It is possible that significant savings may be achieved by combining the data re-ordering step with the process of storing the data in UV store. For this to be possible it is necessary that the UV data store has a very high degree of flexibility and control of how the data are physically located after storage. Additionally, the pattern of data writes and reads will be extremely predictable in the case of the UV store and there should be mechanisms in the technology that is adopted to allow for making use of this predictability for maximum performance

20 Analysis of Requirements Interpretation of the requirements outlined in this document must take into account that the facilities are at different planning or construction stages. Some of the projects concerned are upgrades of existing facilities where certain solutions are already in place and new developments have to be built upon the existing infrastructure. The facilities will also serve different scientific communities, each with its own history of analysing data. Certain analysis methods are well established within communities and the expectations of users and their experience in terms of handling large data volumes are on different levels. Therefore, it is difficult to compile a consistent set of requirements. On the other hand the discussions up to now suggest that, although anticipated solutions for handling data may be different, many common issues can be identified. This heterogeneity may in fact fund the bases for close cooperation. Those facilities which are at the early planning phase will learn what concrete issues and experiences with handling large amount of data are considered the most challenging elsewhere, and the more advanced projects may improve the technical realization of the solution based on the experience of the partners. Data rates expected to be recorded at facilities can be grouped in three categories: relatively small data rates at ESS, ILL and SPIRAL2 (neutrons and ions oriented physics), high data rates expected at ESRF, EuroFEL and European XFEL (synchrotrons and FELs) and an order of magnitude higher data rates at SKA (astrophysics) but expected few years later. Summary of the required data throughput is listed in the Table 3. SKA European XFEL EuroFEL ESRF Spiral2 ESS ILL2020 When > Peak data rates per experiment [GB/s] (50) ,56 0,4 small Table 3 Summary of peak data rates at different facilities In all cases, data needs to be transferred from detectors to the storage systems. The preferable network infrastructure is generally based on the standard 10Gigabits Ethernet (10GE). If the required bandwidth exceeds the capacity of a single 10GE link, multiple links need to be used. Typically the protocol which is required to transfer data from detector is based on UDP. UDP is used because the protocol overhead is low and allows for high speed data transfer. However, this requires complex tuning of the computer systems to guarantee acceptable level of packet loss rate and defining recovery policy to handle lost packets. Assuring long term stability and sustainability of data transfer gets more difficult in multi links setup and requires careful design of the network infrastructure

21 Data acquired by the detectors needs to be stored in dedicated buffers (disk storage). These buffers must be designed in order to cope with the maximum data rates. The highest priority is put on reliable data recording. The capacity of buffers is typically defined based of the requirement to accumulate data for 2-3 days of operation. In some cases like SKA much shorter time is required but the data rate is an order of magnitude higher. The important role of the buffers is to minimize the interference between writing and reading operations. Unpredictable access pattern to data can quickly lead to significant IO performance degradation. Appropriate partitioning of the system and technique for dealing with concurrent data accesses need to be established and tested. Before data is transferred to the offline storage and archive systems it is usually accessed for data monitoring quality, pre-processing or rejection of bad-quality data. The final architecture proposed will require a possibility to include data processing algorithms into the data acquisition chain. This may be initiated as soon as data is read from the detector, on the way to the buffer or right after it is stored. In some cases the procedure can be exactly defined and under full control of the facility. In these cases specialized software and applications running on dedicated computing clusters with optimized infrastructure and architecture, most suitable for the processing algorithms, are needed (i.e. at SKA). In cases where experiments are performed by external users visiting the facility for short time the high level of flexibility is necessary. Providing almost real-time access to data for experiment specific data quality evaluation using well established community tools is required. This can be realized by either limiting the access to a subset of data or by using analysis algorithms in a mode which is just sufficient to conclude if the data can be fully analysed at the later stage. This data pre-analysis step must not interfere with data recording chain in order to guarantee that all data are safely stored. There is a clear need to find appropriate model and technical solution to satisfy the requirement of flexibility in data access with the high data rates. Another common subject identified is the data aggregation for multiple streams from the same or different detectors. Data aggregation technique must take into account different frequencies of generating data depending on the source (i.e. pulse related data vs. slow control information) and different sizes (single scalar value vs. large image data). Merging all the streams to a single output channel may be very beneficial as it significantly simplifies the usage of the data at the later analysis stages. However, the high data rates may require storing multiple streams separately to improve writing and reading performance. In many cases data processing needs to be performed several times and the data rates are few times higher than for recording. The physical files organization and the data structure within files must be optimized rather for fast reading than for fast writing. Additionally, file sizes must be controlled to avoid extremes as both very small and very huge files are difficult to handle at the offline storage and in the archive

22 Usually data files need to be sent out from the local buffers to the shared storage system dedicated for offline analysis or even outside the facility if the available wide area network bandwidth is capable of handling the rate. The earlier mentioned concurrency issues related to the fast data evaluation apply also to the data export services from the local buffer. Data stored in the archive needs to be secured for long time. In all cases raw data is meant to be immutable and write once read many (WORM) policy applies. Therefore it is needed to assure that data will not be accidently modified or corrupted during the entire lifetime. Integrity check must detect any change in the file content. At the moment tapes are the most reliable long term storage media but accessing them requires special coordination in order to achieve the required performance. Data archiving strategies for different data types should also be defined to allow the best and the most cost-effective usage of resources without increasing the risk of data losses. Data protection is important for facilities which act as a service provider to users. Since users groups typically compete between each other this aspect must be taken into account already at the data acquisition step. On the other hand it is highly beneficial to share the data between scientists in the long term. The open access policies are also subject to other EU funded projects (e.g. Pan-DATA). In CRISP the work will concentrate on the technical realization of the data policies and requirements to protect data right at the beginning of the recording phase. Dedicated document will address these issues in more details. Identified synergies Based on the analysis of the requirements the following list of topics and related tasks has been identified as a guideline for further work within the WP18. Using 10GE network for high throughput data transfers o Optimal network design and hardware experience o Experience in using various protocols like UDP or TCP (data bandwidths, error handling, analysis of packets losses) Online data processing models including: o Data aggregation from multiple sources o Parallel data processing (concurrent access to memory) o Quality monitoring o Data rejection o Data compression o Data formatting o Fast data analysis

23 Writing data to storage o Local buffers for safety and efficiency o Separation between online and offline environment o Testing cluster file systems o Concurrent access to filesystem Data archiving o Archiving strategies o Consistency checks

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Data Needs for LCLS-II

Data Needs for LCLS-II Joint Facilities User Forum on Data Intensive Computing, June 16 th 2014 Data Needs for LCLS-II Amedeo Perazzo SLAC LCLS Data Throughputs Current LCLS data system can handle fast feedback and offline analysis

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

BIG data big problems big opportunities Rudolf Dimper Head of Technical Infrastructure Division ESRF

BIG data big problems big opportunities Rudolf Dimper Head of Technical Infrastructure Division ESRF BIG data big problems big opportunities Rudolf Dimper Head of Technical Infrastructure Division ESRF Slide: 1 ! 6 GeV, 850m circonference Storage Ring! 42 public and CRG beamlines! 6000+ user visits/y!

More information

WHITE PAPER BRENT WELCH NOVEMBER

WHITE PAPER BRENT WELCH NOVEMBER BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5

More information

Data-analysis scheme and infrastructure at the X-ray free electron laser facility, SACLA

Data-analysis scheme and infrastructure at the X-ray free electron laser facility, SACLA Data-analysis scheme and infrastructure at the X-ray free electron laser facility, SACLA T. Sugimoto, T. Abe, Y. Joti, T. Kameshima, K. Okada, M. Yamaga, R. Tanaka (Japan Synchrotron Radiation Research

More information

Advanced Computing. for large Experiments @ DESY. Volker Guelzow IT-Gruppe Hamburg, Oct 29th, 2014

Advanced Computing. for large Experiments @ DESY. Volker Guelzow IT-Gruppe Hamburg, Oct 29th, 2014 Advanced Computing for large Experiments @ DESY Volker Guelzow IT-Gruppe Hamburg, Oct 29th, 2014 Management for Beamlines: 11 12 Visualization (Linux, Windows) Analysis (Linux, Windows) 8 Admin NSD, NFS

More information

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive Using XenData Software and a Spectra Logic Archive With the Video Edition of XenData Archive Series software on a Windows server and a Spectra Logic T-Series digital archive, broadcast organizations have

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Analysis Programs DPDAK and DAWN

Analysis Programs DPDAK and DAWN Analysis Programs DPDAK and DAWN An Overview Gero Flucke FS-EC PNI-HDRI Spring Meeting April 13-14, 2015 Outline Introduction Overview of Analysis Programs: DPDAK DAWN Summary Gero Flucke (DESY) Analysis

More information

New Design and Layout Tips For Processing Multiple Tasks

New Design and Layout Tips For Processing Multiple Tasks Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE

More information

XFEL DAQ and control network infrastructure Minutes of the 27 th March 2008 meeting (Revised 3.4.2008)

XFEL DAQ and control network infrastructure Minutes of the 27 th March 2008 meeting (Revised 3.4.2008) XFEL DAQ and control network infrastructure Minutes of the 27 th March 2008 meeting (Revised 3.4.2008) 1. Present DESY-IT: Kars Ohrenberg, Thomas Witt, XFEL: Thomas Hott. WP76: Sergey Esenov and Christopher

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Biomile Network - Design and Implementation

Biomile Network - Design and Implementation Likely Evolution of Computational Infrastructure for Bio Beam Lines Over the Next Five Years Dieter K. Schneider BNL Biology and the PXRR at the NSLS BNL, April 21, 2010 with expert advise from: James

More information

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY White Paper CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY DVTel Latitude NVMS performance using EMC Isilon storage arrays Correct sizing for storage in a DVTel Latitude physical security

More information

DELL s Oracle Database Advisor

DELL s Oracle Database Advisor DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Moving Virtual Storage to the Cloud

Moving Virtual Storage to the Cloud Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage

More information

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal

More information

Configuring Celerra for Security Information Management with Network Intelligence s envision

Configuring Celerra for Security Information Management with Network Intelligence s envision Configuring Celerra for Security Information Management with Best Practices Planning Abstract appliance is used to monitor log information from any device on the network to determine how that device is

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v.

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Existing backup methods Most companies see an explosive growth in the amount of data that they have

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information

USB readout board for PEBS Performance test

USB readout board for PEBS Performance test June 11, 2009 Version 1.0 USB readout board for PEBS Performance test Guido Haefeli 1 Li Liang 2 Abstract In the context of the PEBS [1] experiment a readout board was developed in order to facilitate

More information

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS Successfully configure all solution components Use VMS at the required bandwidth for NAS storage Meet the bandwidth demands of a 2,200

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes

More information

Implementing a Digital Video Archive Based on XenData Software

Implementing a Digital Video Archive Based on XenData Software Based on XenData Software The Video Edition of XenData Archive Series software manages a digital tape library on a Windows Server 2003 platform to create a digital video archive that is ideal for the demanding

More information

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser Network Performance Optimisation and Load Balancing Wulf Thannhaeuser 1 Network Performance Optimisation 2 Network Optimisation: Where? Fixed latency 4.0 µs Variable latency

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Application Brief: Using Titan for MS SQL

Application Brief: Using Titan for MS SQL Application Brief: Using Titan for MS Abstract Businesses rely heavily on databases for day-today transactions and for business decision systems. In today s information age, databases form the critical

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

A new innovation to protect, share, and distribute healthcare data

A new innovation to protect, share, and distribute healthcare data A new innovation to protect, share, and distribute healthcare data ehealth Managed Services ehealth Managed Services from Carestream Health CARESTREAM ehealth Managed Services (ems) is a specialized healthcare

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Understanding Enterprise NAS

Understanding Enterprise NAS Anjan Dave, Principal Storage Engineer LSI Corporation Author: Anjan Dave, Principal Storage Engineer, LSI Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

IBM Global Technology Services November 2009. Successfully implementing a private storage cloud to help reduce total cost of ownership

IBM Global Technology Services November 2009. Successfully implementing a private storage cloud to help reduce total cost of ownership IBM Global Technology Services November 2009 Successfully implementing a private storage cloud to help reduce total cost of ownership Page 2 Contents 2 Executive summary 3 What is a storage cloud? 3 A

More information

How to handle Out-of-Memory issue

How to handle Out-of-Memory issue How to handle Out-of-Memory issue Overview Memory Usage Architecture Memory accumulation 32-bit application memory limitation Common Issues Encountered Too many cameras recording, or bitrate too high Too

More information

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution August 2009 Transforming your Information Infrastructure with IBM s Storage Cloud Solution Page 2 Table of Contents Executive summary... 3 Introduction... 4 A Story or three for inspiration... 6 Oops,

More information

Veeam Cloud Connect. Version 8.0. Administrator Guide

Veeam Cloud Connect. Version 8.0. Administrator Guide Veeam Cloud Connect Version 8.0 Administrator Guide April, 2015 2015 Veeam Software. All rights reserved. All trademarks are the property of their respective owners. No part of this publication may be

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

LTE BACKHAUL REQUIREMENTS: A REALITY CHECK

LTE BACKHAUL REQUIREMENTS: A REALITY CHECK By: Peter Croy, Sr. Network Architect, Aviat Networks INTRODUCTION LTE mobile broadband technology is now being launched across the world with more than 140 service providers committed to implement it

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com

More information

Linking raw data with scientific workflow and software repository: some early

Linking raw data with scientific workflow and software repository: some early Linking raw data with scientific workflow and software repository: some early experience in PanData-ODI Erica Yang, Brian Matthews Scientific Computing Department (SCD) Rutherford Appleton Laboratory (RAL)

More information

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Sponsored by: Prepared by: Eric Slack, Sr. Analyst May 2012 Storage Infrastructures for Big Data Workflows Introduction Big

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Windows Server 2008 R2 Hyper-V Live Migration

Windows Server 2008 R2 Hyper-V Live Migration Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described

More information

Frequently Asked Questions

Frequently Asked Questions Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...

More information

Storage of the Experimental Data at SOLEIL. Computing and Electronics

Storage of the Experimental Data at SOLEIL. Computing and Electronics Storage of the Experimental Data at SOLEIL 1 the SOLEIL infrastructure 2 Experimental Data Storage: Data Hierarchisation Close Data : beamline local access 3 to 4 days min. Recent Data : fast access, low

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration Solutions Integrated Storage Appliances Management Optimized Storage & Migration Archive Data Retention & Compliance Services Global Installation & Support SECURING THE FUTURE OF YOUR DATA w w w.q sta

More information

I. General Database Server Performance Information. Knowledge Base Article. Database Server Performance Best Practices Guide

I. General Database Server Performance Information. Knowledge Base Article. Database Server Performance Best Practices Guide Knowledge Base Article Database Server Performance Best Practices Guide Article ID: NA-0500-0025 Publish Date: 23 Mar 2015 Article Status: Article Type: Required Action: Approved General Product Technical

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1

SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1 SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1 Application Note: SAN/iQ Remote Copy Networking Requirements SAN/iQ Remote Copy provides the capability to take a point in time snapshot of

More information

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

A Survey of Shared File Systems

A Survey of Shared File Systems Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

Large File System Backup NERSC Global File System Experience

Large File System Backup NERSC Global File System Experience Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory

More information

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting Information in a Smarter Data Center with the Performance of Flash 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in

More information

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,

More information

How to recover a failed Storage Spaces

How to recover a failed Storage Spaces www.storage-spaces-recovery.com How to recover a failed Storage Spaces ReclaiMe Storage Spaces Recovery User Manual 2013 www.storage-spaces-recovery.com Contents Overview... 4 Storage Spaces concepts and

More information

Deploying VSaaS and Hosted Solutions Using CompleteView

Deploying VSaaS and Hosted Solutions Using CompleteView SALIENT SYSTEMS WHITE PAPER Deploying VSaaS and Hosted Solutions Using CompleteView Understanding the benefits of CompleteView for hosted solutions and successful deployment architecture Salient Systems

More information

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software The Video Edition of XenData Archive Series software manages one or more automated data tape libraries on

More information

Development of a Standardized Data-Backup System for Protein Crystallography (PX-DBS) Michael Hellmig, BESSY GmbH

Development of a Standardized Data-Backup System for Protein Crystallography (PX-DBS) Michael Hellmig, BESSY GmbH Development of a Standardized Data-Backup System for Protein Crystallography (PX-DBS) Michael Hellmig, BESSY GmbH Introduction Motivation Individual solutions for data backup at each synchrotron PX beamline

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

Implementing Offline Digital Video Storage using XenData Software

Implementing Offline Digital Video Storage using XenData Software using XenData Software XenData software manages data tape drives, optionally combined with a tape library, on a Windows Server 2003 platform to create an attractive offline storage solution for professional

More information

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand

More information

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data Actifio Big Data Director Virtual Data Pipeline for Unstructured Data Contact Actifio Support As an Actifio customer, you can get support for all Actifio products through the Support Portal at http://support.actifio.com/.

More information

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution Wide-area data services (WDS) Accelerating Remote Disaster Recovery Reduce Replication Windows and transfer times leveraging your existing WAN Deploying Riverbed wide-area data services in a LeftHand iscsi

More information

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data HyperQ DR Replication White Paper The Easy Way to Protect Your Data Parsec Labs, LLC 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com

More information

Improvement Options for LHC Mass Storage and Data Management

Improvement Options for LHC Mass Storage and Data Management Improvement Options for LHC Mass Storage and Data Management Dirk Düllmann HEPIX spring meeting @ CERN, 7 May 2008 Outline DM architecture discussions in IT Data Management group Medium to long term data

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

Continuous Data Protection. PowerVault DL Backup to Disk Appliance

Continuous Data Protection. PowerVault DL Backup to Disk Appliance Continuous Data Protection PowerVault DL Backup to Disk Appliance Continuous Data Protection Current Situation The PowerVault DL Backup to Disk Appliance Powered by Symantec Backup Exec offers the industry

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Avid ISIS 7000. www.avid.com

Avid ISIS 7000. www.avid.com Avid ISIS 7000 www.avid.com Table of Contents Overview... 3 Avid ISIS Technology Overview... 6 ISIS Storage Blade... 6 ISIS Switch Blade... 7 ISIS System Director... 7 ISIS Client Software... 8 ISIS Redundant

More information

NAS or iscsi? White Paper 2007. Selecting a storage system. www.fusionstor.com. Copyright 2007 Fusionstor. No.1

NAS or iscsi? White Paper 2007. Selecting a storage system. www.fusionstor.com. Copyright 2007 Fusionstor. No.1 NAS or iscsi? Selecting a storage system White Paper 2007 Copyright 2007 Fusionstor www.fusionstor.com No.1 2007 Fusionstor Inc.. All rights reserved. Fusionstor is a registered trademark. All brand names

More information

Quantifying Hardware Selection in an EnCase v7 Environment

Quantifying Hardware Selection in an EnCase v7 Environment Quantifying Hardware Selection in an EnCase v7 Environment Introduction and Background The purpose of this analysis is to evaluate the relative effectiveness of individual hardware component selection

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Distribution One Server Requirements

Distribution One Server Requirements Distribution One Server Requirements Introduction Welcome to the Hardware Configuration Guide. The goal of this guide is to provide a practical approach to sizing your Distribution One application and

More information

Chapter. Medical Product Line Architectures 12 years of experience. B.J. Pronk Philips Medical Systems

Chapter. Medical Product Line Architectures 12 years of experience. B.J. Pronk Philips Medical Systems Chapter Medical Product Line Architectures 12 years of experience B.J. Pronk Philips Medical Systems Key words: Abstract: Example architectures, product line architectures, styles and patterns The product

More information