Storage of the Experimental Data at SOLEIL 1
the SOLEIL infrastructure 2
Experimental Data Storage: Data Hierarchisation Close Data : beamline local access 3 to 4 days min. Recent Data : fast access, low latency time 100 days max. Long-Term Data : on-line access without any human intervention 1 year min. to 5 years max Archive Data : older data without needing an immediate access 3
Experimental Data Storage: the needs Inventory of users needs done in 2004 in order to : Understand how they would work Identify and characterize the data Analyze the data life cycle Needs synthesis : Depending of the beam line, amount of data will be very different Variable life time (from 2 weeks to all the life ) Permanent availability of data Users will want to export data by network or medium Archiving for long term storage 4
Experimental Data Storage: Hardware Architecture (1) Local storage and access point for close data, at each Beam Line : DELL servers, 300 GB to 1.8 TB SCSI hard disks Access points in each of the two computing rooms : DELL servers Primary storage for recent data : EMC SATA disks libraries (40 TB to 113 TB) Secondary storage for long-term data : GRAU tape libraries, 1344 slots with LTO3 tapes (400 GB to 800 GB) & SAIT1 tapes (500 GB to 1300 GB) A dedicated storage network : Gigabit IP, optical fibres 5
Experimental Data Storage : Hardware Architecture (2) User s Stations Beam Line NFS/CIFS Control and Data Acquisition System CASSIOPEE ODE SMIS florea 112,5 TB AILES EMC CX700 Central Building GRAU ITL-XL (SAIT1) MARS dorsata up to 1.7 PB compressed PLEIADES DISCO DESIRS Internet Processing Cluster MICROFOC violacea 250 GB DIFFABS cerana 112,5 TB EMC CX700 GRAU ITL-XL (LTO3) Synchrotron Building mellifera up to 1 PB compressed TEMPO DEIMOS METROLOGIE CRISTAL Site Network SWING 440 GB PROXIMA1 1800 GB SAMBA 6
Software Solution based on the concept of cellular storage ACTIVE CIRCLE: A circle composed of a whole of cells (servers), connected via IP, without any hierarchy between them. The same kernel on all the cells includes features for: Continuous Backup: the status of the file systems can be viewed instantly at each step in their history and last versions of file can be restored Replication: data are simultaneously available on several cells Hierarchical Storage: data are automatically migrated from disks to tapes according to a defined strategy Multi-Sites sharing: the same data can be shared between several sites ; read/write conflicts are automatically managed High Availability: the system permanently verifies the presence of the other cells. In case a cell disappears, a fail-over mechanism automatically redirects the requests to the closest cell. 7
the strategy The experimental data are deposited on the access points of the beam lines or from the cluster. Data are quasi-immediately copied on the two EMC disk libraries and on the two GRAU tape libraries. Data remain in the local cache during 4 days (by waiting the circular cache) After 100 days, the 2 EMC copies expire ; the 2 copies on tapes are always available. The 2 tape copies expire after 1 to 5 years, depending on the involved beam line. If asked, the data can be archived before expiration of their tape copies. When technology breaking, we will have to examine with the beam lines which archives must be transferred in the new technology. 8
the Data Format 9
Data Format The SOLEIL choice: the NeXus format (http://www.nexusformat.org) Initially based on HDF (Hierarchical Data Format, used by several major organizations such as NASA or NOAA) Key Points: Ability to store both extremely simple data, e.g. a simple (x,y) array, as highly complex instrument descriptions Ability to integrate contextual data Self-describing format Efficiency in terms of storage space and access time Evolutivity Already used by other facilities 10
Data Format NeXus : a format allowing to organize the data inside a file the file = a data tree divided into groups NXEntry: a top level group inside the file including all the data related to an acquisition: Contextual data Experimental data Example : 3 NXEntry groups, each containing the data of a scan. 11
Data Format The NeXus format allows to store the contextual data associated to a set of experimental data: Equipment status at the acquisition time: motor positions, machine current Data related to the users Description of the sample Description of the acquisition process (to be developed) The experimental data 12
the Data Recording 13
Data Recording Provided by a TANGO device server : the Data Recorder DS Données expérimentales GUI (experiment control) TANGO control system Storage NeXus NeXus NeXus NeXus NeXus Experimental data + contextual data DataRecorder libdatastorage data collectors NeXus meta-dtd XML XML Recording models Configuration Beamline 14
Data Recording Storage LDAP active directory LDAP DataRecorder NeXusReader TechnicalData ExperimentalFrame AuthServer Bus TANGO 15
Recording Devices DataRecorder Write experimental data and associated metadata in NeXus files AuthServer Allows confidentiality of data Data files are recorded at the right place in the storage facility. NeXusReader This device reads NeXus files and expose datasets, on demand, as dynamics attributes. 16
Recording Devices TechnicalData Device that maintains two lists of ds containing informations about the technical environment: The DataRecorder reads theses lists and records the corresponding data at the begining and at the end of a recording session (i.e for a NXentry group) SampleData sample data collector device. The aim of this ds is to provide a unified interface to sample information. 17
Data Recording Configuration tools: Data Storage Control Center : it s a set of components (beans) to control key functions of registration system. 18
Bean AuthServer Allows to generate a key associated to a project. Allows only authorized users to record, then consult the data on their experiment. 19
Bean DataRecorder Allows to record and configure data saving. 20
Bean TechnicalData Allows to select the list of devices whose attributes are to be collected and recorded at the beginning or end of recording session. 21
Bean ExperimentalFrame Used to save the experimental context associated with one or several experiments. 22
Data Recording Bolero to edit XML files (in the configuration of DataRecorder) describing the sources of data, in this case the devices Tango. 23
Tools to display, extract and exploit the recorded Data 24
to display data : Storage of the Experimental Data: Tools BALADI : to easily visualize data from new NeXus files as soon as they are produced by the acquisition system 25
Tools to retrieve data: TWIST : a Web access using a GUI to easily search and extract data through the net, with the ability to explore data files content 26
Tools to exploit data : first specific SOLEIL tools NXextract : a tool able to export data from a NeXus file into a file in a arbitrary format (ASCII or binary). It uses scripts to describe theses formats. Many scripts have been written. For example : - to extract all images from a nexus file in jpeg format - create an EDF file (ESRF format) - extract 2D scans from each NXentry group in NeXus file An IGOR plug-in to directly read NeXus files to be completed. 27