NSO Data Center Conceptual Design

Size: px
Start display at page:

Download "NSO Data Center Conceptual Design"

Transcription

1 NSO Data Center Conceptual Design January 11, 2014 version 0.4 Kevin Reardon

2 Table of Contents Document Overview! 3 Purpose! 3 Method! 3 Scope! 3 References! 3 System Design! 4 System Components! 4 System Architecture! 5 Data Transfer! 5 Data and Metadata Management! 10 Data Processing! 14 Data Discovery! 14 Development Areas! 15 Architecture! 15 Software Modules! 16 Work Breakdown Structure! 21 Appendix 1: Data Processing Levels! 23 Appendix 2: Data Compression Levels! 26 Appendix 3: Software Components! 28 Revision History Date Version Released By Comments 07/10/ K. Reardon for internal review 10/01/ K. Reardon revisions based on comments, changed processing levels January 11, 2014! Page 2

3 1. Document Overview 1.1. Purpose The following document sketches out the essential components that form the basis of the NSO Data Center functionality. The document lays out a strawman architectural design that will support the desired system capabilities. It presents a preliminary work breakdown structure Method The overall strategic and scientific goals that motivate the overall design and operation of the Data Center are described and taken from the Vision and Scope Document for the NSO Data Center Development and the NSO Data Center Business Requirements. This document builds on those descriptions, and incorporates the Calibration Processing and Storage Requirements for ATST Data and Data Center Interface Requirements, to describe some of the primary data center components. Information on the proposed implementation solutions are also described. This leads to a breakdown of different software (and hardware) components that will be the focus of the data center development Scope This document seeks to provide some technical discussion and potential implementation approaches that can meet some of the Data Center needs. These are not meant to be final or even preferred solutions, but are rather intended to demonstrate some of the key issues and implementable technologies that make the proposed data center viable. The document also describes a Work Breakdown Structure and development packages that can serve a guideline in estimating the development schedule and effort. 1.4.References Vision and Scope Document for NSO Data Center Development [VSD]! Version October 2013 NSO Data Center Business Requirements [NDCBR]! Version November 2013 NSO Data Center Use Cases [NDCUC]! Version October 2013 Calibration Concepts for ATST Data [CCAD]! Version October 2013 Calibration Processing and Storage Requirements for ATST Data [CPSR]! Version October 2013 Data Center Interface Requirements for ATST Data [CDIR]! Version October 2013 January 11, 2014! Page 3

4 2. System Design 2.1. System Components The system will comprise several components software, hardware, personnel that will work in concert to provide an adequate service to the Data Center users, immediately and in the future. To enable the capabilities outlined in the overall system vision (VSD, NDCBR), the system will need the following major operational components, as already outlined in the VSD. Data Transfer: CO 1 : Data ingest system to receive data files, database contents, and other structured and unstructured content from the ATST and other facilities. CO 2 : Data transfer system system to move those information to a data storage system controlled by the Data Center providing the means to validate data integrity. CO 12 : Data delivery mechanisms that allow datasets of different volumes and types to be delivered to the user in an efficient manner. Interfaces: CO 4 : Interfaces to provide seamless access to digital resources to NSO researchers and community, capable of supporting a range of different usage scenarios. CO 11 : Data discovery interfaces for efficient and flexible searches of the data and metadata contents that allow users to locate different dataset collections of interest. CO 14 : Public interfaces offering meaningful access to non-specialist users. Computational Resources: CO 3 : Data storage system that provides reliable, long term maintenance of the data contents, designed for straightforward migration to new technological solutions. CO 5 : Data processing framework that allows scientific software modules to be run both automatically and with staff or user oversight. CO 9 : Computational resources capable of performing the desired computations on the data in a timely and automated manner. CO 10 : Computational resources that provide NSO researchers with low-latency access and interactive data processing environment for Data Center contents. CO 6 : Mechanisms to allow easy substitution or addition of new software modules into the data processing framework. CO 7 : Software packages capable of processing the data coming from the ATST and other NSO projects to generate the desired data products in a timely manner. CO 8 : Software practices and management system that allows easy maintenance of the Data Center software components. Foundational Tools: CO 13 : Data stewardship tools that allow the long-term integrity and usability of the NSO data collection to be preserved. CO-15 : Security model able to protect NSO digital resources and reserved information. January 11, 2014! Page 4

5 2.2. System Architecture Given the system functionality outlined above, and encompassing the operational components, we have outlined a basic system architecture. This high-level architecture is shown in Figure 3. The system will need to manage significant volumes of data arrays, as well as numerous metadata elements associated with each of those data arrays. Those metadata are expected to possibly be modified or augmented in time. A significant amount of additional metadata may also be managed within the datacenter. The metadata will need to be accessed efficiently but also often independently of the data arrays themselves. Therefore the system will be based on a Data Repository that provides separate, optimized management for the metadata and data elements. The Data Storage system will manage storage resources available to the system and allocate storage for data files that contain the data arrays archived by the system. The storage resources may be disk, tape, or even external storage resources. The Content Management system will instead be tasked with managing all the metadata content available to the system. This will include the observational metadata (i.e. FITS header information) related to each data array or observation. The Content Management system will maintain associations between those metadata and the location in the Data Storage system of the corresponding data arrays. The Content Management system will also maintain other metadata aggregations, including Instrument, Calibration, and Curation databases, as well as information on user identities and authorizations. The third principal component of the Data Center will be the Processing system, that will be responsible for enabling and executing any calculations or transformations to be applied to the data. This system will encompass a version-controlled software repository of software packages that will be executed in a workflow-controlled environment. The primary tasks for these software packages will be the Calibration, Analysis, and Packaging of the Data Center content. The Processing system will be enabled to avail itself of external computational resources when directly managed components are possibly insufficient. The user will interact with the system by making queries (and receiving responses) and scheduling data delivery. The users (or a subset thereof) will also be allowed to provide modifications to some metadata content, annotations to datasets, or provide auxiliary data content. In the following sections we will examine some of these components in more detail, examining their possible implementation approaches to be explored Data Transfer Data Transfer Telescope to Data Center: The data transfer system will be responsible for the retrieval of information recorded at the telescope and managing its controlled ingest into the Data Center systems. This information will include data arrays, databases, logs, and possibly unstructured information (e.g. observer notes). The export of the data from the telescope is expected to be performed using a combination of network transfers and transport of physical media. In either case, the data will be made available to the Data Center over a controlled interface. The relative usage of these two transport methods will vary on a daily basis, depending on acquired data volumes, and on the longer term as on-site storage capacity and network bandwidth evolve. January 11, 2014! Page 5

6 The current estimate for daily data transfer volumes is 12 TB, which corresponds to slightly less than 1.2 Gbit/sec sustained transfer rates. Peak daily data volumes may be 50 TB or more, which Figure 3: Conceptual design of Data Center showing Data Repository, comprised of Data Storage and Content Management systems, and the Processin System. Data in received from ATST (and NISP) as indicated by the arrows entering from the left. Other data (including catalogs) may be ingested from other facilities ( Context Data ) or from Users ( Data Return ). The system will also interface with external Storage and Processing resources (in current parlance, the cloud ). requires a 5 Gbit/sec average rate for transfer within 24 hours, or 2.5 Gbit within 48 hours. The data transfer mechanisms should be implemented so that they do not produce a bottleneck in the system. Even in the presence of sufficient and cost effective bandwidth off the mountain, physical media transport should serve as a fallback in case of large daily data volumes or interruptions in network connectivity. The system should use abstraction to maintain a level of independence from transport method. It should also be noted that the dual mechanisms for file transport may result in transfers of some files being duplicated through both channels. The system should identity and eliminate any such duplication, while being sensitive to file versions. The file transfer will be based on existing tools that enable reliable duplication of files. For network transfers, GridFTP 1 is a suitable candidate for managing the transfer of data files from the 1 for one implementation see GridFTP - The Globus Project ; see also the GridFTP v2 Protocol Description - Open Grid Forum and HPC SSH/SCP. January 11, 2014! Page 6

7 summit storage system to Data Center controlled resources. GridFTP is a high-performance data transfer protocol that has been developed for reliable and high-volume transmission of data files. Figure 4: Data Center data flow overview, showing interactions among several system components. The data transfer mechanism is outlined at the top, while processing, query, and data delivery follow in succession in the idealized scenario. The Calibration Database is shown, for demonstrative purposes, as an external component to the Data Repository. Based on the File Transfer Protocol with several key extensions, GridFTP provides for parallel transfers, fault tolerance, TCP optimization, data integrity checks, and a security layer. The Data Center could access the directory structure of the file export system at the summit and automatically identify any new files present (based on a stored list of already transferred files) on a regular basis. Those files could then be queued for transfer. The system will also be able to accept prioritization of the data transfers. Prioritization levels can be modified either by external input from the telescope or data center personnel. The prioritization can also be defined based on internal rules, where for example the system would preferentially transfer any data calibration files it finds on the remote files system. Similar mechanisms might also be used for the transfer from the filesystem on the physical media moved down the mountain and attached to Data Center controlled computational resources. Depending on achievable and required data off-load speeds, GridFTP could be used for local transfers as well to facilitate transfer mechanism independence. Otherwise, robust direct transfer January 11, 2014! Page 7

8 mechanisms such as rsync could be used to move data among filesystems. Additional solutions, including commercial software, should also be explored. A key component of any data transfer mechanism will be to establish data identity as early as possible and monitor data integrity during the movement of the data through the Data Center systems. The preferred mechanism will be for the systems at the telescope to generate one or more checksums for all exported data files. These checksums would be transmitted to the data center, which would also serve to communicate the list of files the telescope has made available for transfer. The Data Center would use this list to validate the integrity of any retrieved files, or identify unavailable files. For those files that were correctly transferred, the Data Center will formally assume responsibility, which will be communicated to the summit systems over an agreed-upon interface. For any corrupted or missing files, the Data Center will request retransmission, specifying the cause of the transfer failure. This transfer interface will include additional mechanisms to maintain robustness and avoid placing stress on telescope or facility resources in case of possible failure modes. The proposed sequencing of the data transfer is shown in the upper portion of Figure 2. In addition to the transfer of the data files (and related lists of checksums), the summit facilities will also replicate the full contents of the Header Database to the Data Center on a frequent basis (at least daily). This will allow direct access in the Data Center of the full metadata content for all observations without the need to parse the FITS headers. Other metadata content recorded at the summit may also be copied to the Data Center for long-term preservation or future data analysis technologies. The system should be scalable to accept new types of content from the telescope. The nature of the data transfer via physical media, and the need to accept responsibility for the acquired data as soon as possible (so that storage resources on the mountain can be freed for subsequent observations), implies the need for the data ingest to occur at a facility located close to the telescope itself. The baseline assumption has been that the Data Center will maintain resources, at minimum a storage buffer, on Maui, possibly at the NSO-managed base facility. This would be considered a Data Center branch which may require some level of local staffing. Further operational requirements might dictate the need for some processing capacity at the branch location as well. Other possible locations on Maui could also be considered, with the requirement they provide physical and network access for Data Center functions. If the bulk of the transfer from the telescope to the main Data Center could be achieved over network connections, then the need for significant Data Center resources on Maui could be much reduced. The relative cost effectiveness of such an approach (increased network bandwidth costs versus increased personnel and hardware costs) will be evaluated. There may also be additional organizations on Maui (MHPCC, MCC) that may partner with NSO and provide access to datacenter-related resources. Data Transfer Context Data The Data Center will also incorporate data content from external sources that will serve as crucial context data for the interpretation of the targeted ATST data. The system will support mechanisms for the ingest of those data, preferably using existing interfaces supported by those providers. Virtual observatory protocols could provide a common access method for multiple data sources, reducing the effort needed to implement and expand the sources of context data. January 11, 2014! Page 8

9 The information retrieved from external providers may include observation logs or event catalogs that will be used in searches for ATST datasets. By duplicating some of these catalogs within the Data Repository, it will be possible to provide faster searches and a more reliable user experience. Data Transfer Data Delivery Users will request copies of certain datasets to be transferred to their own computing resources. The system will attempt to provide these data in the most efficient way possible, taking into account limitations on Data Center and user resources. The two primary means of delivery will be through network transfer or shipping of physical media. An order fulfillment system will be needed to allow users and data center staff to track data transfer requests. Network transfers will be the preferred method for data delivery for reasonable volumes of data. There is a question of the upper limits on the data volumes that can be transferred in a reasonable manner over the network. For users with good connectivity, transfer speeds of 20 MB/sec or more can routinely be achieved at present. Organizations connected to Internet2 might achieve even greater data rates. However, 20 MB/sec results in a daily transfer volume of 1.6 TB, which is still significantly below the average daily data volumes for ATST of 12 TB. Improvements in network capacity might improve the achievable transfer speeds, making transfers of a full day s dataset achievable (for some users). It remains to be seen what the typical transfer request might be for ATST data. Will users request full datasets, or will they be satisfied with a limited number of derived quantities which may have significantly smaller volumes? Further system development will study the tradeoff in software development costs of the requested pipeline components against the possible recurring costs for network bandwidth. Achieving the greatest transfer rates will require data packaging and transfer protocols to be designed in concert. GridFTP may again be a reasonable client for end users to retrieve data, its use in both data ingest and export possibly simplifying system maintenance. It will be key to support both parallel and incremental transfers to users to optimize bandwidth and avoid the need for any duplicate transmission. Assuming the Data Center storage resources will be located within the Colorado University (CU) campus network, it will also be necessary to work with the research computing staff there to understand the impact of large sustained data transfers from the NSO Data Center on the overall campus connections to Internet2 and commodity internet. Current internet connectivity is through 10 Gbit/sec connections, though it is expected that this may increase to a 100 Gbit/sec connection by the time of ATST operation. As an alternative to network transfer, the system will also support transfer of large volumes of data to users on physical media. In this case the order fulfillment system will allow data center staff to receive, discharge, and monitor orders as they are handled. The types of physical media supported will be limited to simplify transactions. Automated procedures will oversee the copying of data onto the requested physical media. Upon completion, the staff will be notified and proceed to the shipping of the media. Due to the added costs (media, personnel, shipping), all other means will be used to limit the occurrences of this method of transfer, including lower limits on the volume of data that can be transferred in this way. January 11, 2014! Page 9

10 Data Transfer Data Repository Mirrors In order to reduce the outgoing utilization of the network resources as well as possibly reducing the transfer time to users, it is to be expected that data mirrors will be set up at various strategic sites within the US and abroad. This approach, common in other fields, has been widely deployed in solar physics in the distribution of the data from a variety of satellites, the latest being the Solar Dynamics Observatory. A similar approach has been developed for ALMA and its network of Regional Centers (ARC s). Both of these systems will help inform the development of similar mechanisms for ATST. No formal proposals for ATST data mirrors have been received, but presumably these mirror sites would replicate only a portion of the data holdings, or only for a limited period of time Data and Metadata Management Data Storage - Usage Scenarios A key component of the Data Center is to safely store NSO s digital resources both for immediate use and as a long term repository. These two primary needs have different access profiles and imply the need for some layering of technical solutions. The immediate use of the data involves accessing the data both for visualization purposes and to apply additional processing. Visualization uses typically require lower bandwidth interactions but often place requirements on latency so that user productivity is not reduced. On the other hand some processing of the data will be limited by the I/O rates for accessing the necessary data from storage systems. The datasets needed for immediate use will be in part predictable. Recently obtained datasets will be needed for immediate examination or processing through standard pipelines. In addition, some automated analysis procedures may be able to indicate specific data contents that they will need to access as part of the expected processing. Data access history may provide guidance on the relative demand for specific data files, indicating which files are most likely to see immediate use. Interactive exploration and specialized processing under user control may result in a more random-access usage profile, but it is conceivable that users could, if required, provide guidance on data usage that would allow the system to optimize the performance for accessing the indicated datasets of interest. A subset of data access scenarios require, in initial perusal, viewing data at reduced spatial and temporal resolution. A possible solution would be to maintain lower resolution copies of some datasets that would satisfy such the needs for quick examination of highly used datasets (i.e. thumbnails of datasets). Once a dataset of interest is identified, the full resolution data could be accessed with a greater delay but with only limited effect on user productivity. It is possible that a significant part of the data content may see limited use. Policy decisions may be made about limiting the long-term retention of some datasets. However, there will always be datasets that have less value or do not offer direct utility for scientific studies. The long-term value needs to be weighed against the opportunity costs of preserving large volumes of such data. It is nonetheless worrisome to discard data from a facility engaged in discovery science. One option would be to use lossy compression techniques to greatly reduce the data volumes to be stored, while minimizing the amount of discarded solar information. Given the characteristics of the data, it is reasonable to expect that a 10:1 compression ratio would still allow the data to be used for January 11, 2014! Page 10

11 scientific evaluation. This may be an acceptable compromise to allow an acceptable level of data preservation without incurring economic costs that could hamper other data center activities. The need to invoke such a solution may vary with times depending on the changing data volumes and storage costs. The determination of the optimal compression types and parameters, as well as the preferable stage at which to apply the compression (raw, calibrated, reconstructed) will require further investigation. Data Storage - Hardware Given the volumes of data generated at the telescope (4+ PB per year) at full operations, and the additional storage that is needed to store intermediate data products, a total data store of approximately 10 PB will presumably be required at full operations to support the NSO annual data center needs. The current standard for storage of these types of volumes is either rotating magnetic disks or linear magnetic tape. These hard disks are undergoing continuous technological development and have a large commercial base driving future advances. Disk capacity has shown impressive growth in capacity over the past 20 years, as measured by the areal density, measured in Gb/in 2 of the platter surface. Based on industry projections of future technological hurdles and solutions (Fig. 5), this growth rate may be slowing, and it is reasonable to expect only a conservative factor of 2 3 improvement in the hard-disk areal density in the coming five years. This would result in hard disks with upper capacities of approximately 10 TB (assuming the same number of platters) in To provide a total of 10 PB of storage, such disk capacities translate into a need for approximately 1200 disks, including 20% RAID overhead for redundancy. With a storage system holding approximately 40 disks per 4U rack-mounted case, this requires a total of 120U of rack space, which could be fit in just 3 racks, given suitable cooling and power supplies. Current costs for large volume storage is on the order of $100,000 per PB (with costs as low as $60,000 reported). If we assume the increased disk volumes will translate into a comparable reduction in storage costs, a 10 PB storage system in 2018 has a projected capital cost (annually recurring) of approximately $400,000. Following industry projections, we assume 1 kw of power per petabye in 2018 for spinning disks, resulting in a total storage system power consumption of more than 90,000 kwh per year. Based on inflated Boulder electricity costs of $0.15 / kwh, this results in an annual expense of $14,000 for disk storage. Energy conservation methods, such as use of low-power disks and disk idling, can help reduce power consumption. This calculation does not include cooling costs for the data center. New technologies, including high-efficiency evaporative coolers, may allow for greatly reduced data center cooling costs. NSO should work with the CU Research Computing Department and other local research centers to identify efficient, sustainable approaches to reduce data center costs and energy consumption. The NSO data center should seek to obtain a Power Usage Efficiency (PUE) of less than 1.25, in line with projected typical data center performances. January 11, 2014! Page 11

12 Figure 5: Growth of hard-disk platter areal density over time, showing the strong increase for the past decade, with annual growth rates well above 50%. Those rates are expected to slow in the coming years as new technologies need to be developed to overcome some of the physical limitations of current technologies (from Marchon, et al., 2013, Adv. in Tribology, art , The latency and bandwidth expectations of the local users will require some local data storage resources. However, there is no a priori need that all the data be physically storage at or near the NSO headquarters in Boulder. Offsite data storage (i.e. cloud-based storage ) should also be investigated, including the inclusion of remote processing, with a careful examination of the economics of remote storage and network transfer costs. Some cloud resources may also address the needs for offsite backup to ensure against data loss due to damage to the data center resources located near NSO headquarters. In addition to the hard-disk based storage system, several other technologies may be worthy of analysis to see if they can address certain needs of the data center. Solid state ( flash ) storage (aka SSD s) will probably continue to have a significantly higher cost per TB than hard disks, and probably will not be cost effective for bulk data storage, even after factoring in the lower power and cooling costs. But they may be suitable for temporary staging of data during processing, when data access times may be an important factor in processing time. This might be particularly efficient as the data are initially ingested after transfer from Maui. Magnetic tapes might offer a solution for near-line storage of some datasets, primarily as a means to maintain a long-term archive of the data sets. Tapes can in some cases offer additional reliability for extended storage, with lower power and cooling costs. But latency times are much larger and some sort of tape-robot is needed to physically transport tapes to a restricted number of tape drives. The current maximum capacity for Linear Tape-Open (LTO) tape cartridges is 2.5 TB, at a cost of $25 per TB. The LTO roadmap projects an increase to 12+ TB capacities presumably by January 11, 2014! Page 12

13 Data Storage - Management The efficient management of large volumes of data comprised of numerous individual images is an area of research and ongoing development in commercial and academic fields. The Data Center development should build on these efforts, identifying other projects with similar data management needs. These design of the storage system will take into account the need to randomly access individual images within larger datasets, as well as provide rapid access to large blocks of contiguous images. The data will be received from ATST in FITS format, and will typically be provided to users in FITS or other standard formats, but there is no requirement that the data be internally managed in the form of FITS files. The SDO and IRIS data management development have employed techniques to store data cubes and their descriptive metadata in FITS format using binary tables that avoids much of the overhead in processing FITS header records. Some of the most directly relevant data management experience in the field of solar physics is in the mechanisms developed at the Joint Science Operations Center (JSOC) at Stanford University 2. They have developed the Storage Unit Management System (SUMS) that handles hundreds of thousands of images obtained each day with solar spacecraft including SDO and IRIS. The system tracks multiple copies of images, spread across different storage media (hard disk and tape) using a database server that tracks storage units, which are essentially directories and sub-directories of associated files. This virtualization of storage resources allows easy modification or updating of those resources with little downstream impact. While the SUMS software itself is probably not useful for direct implementation in the NSO Data Center, certain concepts and architectural elements should be studied as potentially suitable solutions to some of the Data Center needs. The experiences gained in operating such a system in this field can provide guidance for the useful and problematic elements of such a virtualization approach. Similar advantages, and perhaps a SUMS substitute, can be found in the integrated Rule-based Data System (irods) 3, which has been developed as part of a broader grid-related effort to build a next-generation data management system. irods uses a similar database system to track data across varied and distributed storage resources, but also incorporates a rule engine that allows implementation of services for administrative data management and user workflows. (see Figure 6). While perhaps a somewhat more complicated system than SUMS in many respects, it has the advantage of being used and supported by a large user community https://www.irods.org/ 4 see the User Group Meeting program: https://www.irods.org/index.php/irods_user_group_meeting_2013 January 11, 2014! Page 13

14 Figure 6: The irods system architecture (from https://www.irods.org/index.php/ File:irodsArch.jpg), showing both administrative and client interfaces and storage virtualization. Metadata Management TBD 2.5. Data Processing Data Calibration TBD Data Products and Observables TBD 2.6. Data Discovery Discovery Interfaces TBD January 11, 2014! Page 14

15 3. Development Areas From the above discussion, we can identify the primary components that will make up the core of the proposed system. These are areas that will require the bulk of the development resources. For each component there are typically several external sources that can provide input and experience about desirable solutions, as well as potentially actual reference implementations. Each component typically also has several issues related to the specific aspects of the ATST data management. For some components, estimated levels of effort have been outlined Architecture Transfer mechanisms: Purpose: perform and monitor transfer of data from base facility to NSO HQ Source: file replication programs, database mirroring Issues: monitor file integrity; parallel export paths for data and metadata; prioritized transfer of some datasets; compression Data model: Purpose: provide a general data model applicable to a range of NSO data products and different processing procedures Source: VSO, SPASE, SolarNet, IVOA Issues: generalizable model; model expandability; cost/benefit of conformity to external standards; Metadata database: Purpose: provide searchable archive of all data-related metadata Source: database management systems, NISP, SDO, LSST Issues: some very large tables (>10 9 records); conformity to data model; search optimization; metadata updates; metadata additions Data management: Purpose: maintain and track all stored data Source: data storage system, SDO, SUMS, irods Issues: monitor file integrity, data storage and retrieval Metadata standards: Purpose: define metadata standards (based on existing usage) to allow full description of data contents and processing through various applications Source: NISP, VSO, SolarNet, HEK, HELIO Issues: maintaining backwards compatibility, instrument builder compliance; event catalogs Pipeline Framework: January 11, 2014! Page 15

16 Purpose: produce a general infrastructure capable of managing the data calibration and processing workflows for all instruments. Source: NISP, ESO/Gasgano, Taverna, etc. Issues: language compliance, workflow management tools, user uptake Data Delivery: Purpose: transfer requested datasets to users or external repositories Source: GridFTP, SDO Issues: network versus physical media transfer, data format, data mirroring, data versioning, compression 3.2. Software Modules Note: Inclusion below does not imply it will be the role of the data center to fully develop the listed algorithms, but rather the effort will be needed to adapt existing algorithms to the data center workflow. Compression: Purpose: reduce the volume of data to be stored or transferred Source: largely external, generalized signal processing Reference software packages: FPack : part of CFITSIO, developed by NASA/HEASARC can perform Rice, Hcompress, and Gzip compression Lossless, or quantization compression expected to be generally used throughout data handling Issues: multithreading, parallelizable? optimization for solar data cubes choice of quantization levels computing requirements? Collaborators: SDO/LMSAL Effort: 1 p-m JPEG/JPEG2000 : standard, multiple implementations can perform lossless or lossy compression used to store low-fidelity copies of data Questions: multithreading, parallelizable? acceptable compression ratios for solar usage computing requirements? Collaborators: JHelioviewer, Solar Orbiter Effort: 2 p-m 3D/Movie compression : standard, multiple implementations January 11, 2014! Page 16

17 used for lossy compression Issues: adaptability to solar observations can algorithms give estimates of data quality? advantages compared to JPEG2000 Collaborators: SolarNet, USC Effort: 2 p-m Instrument signature corrections: Purpose: remove signatures of instrument and telescope on acquired data Source: instrument builders; existing instrument users High-level software packages: Dark/Bias Calibration : detector specific, straightforward Issues: Calibration stability Calibration data validation Collaborators: instrument builders Development Effort: 2 p-m System Response Calibration : also known as flat-fielding or gain correction possibility of significant temporal variability multiple contributions to overall response calibration methodology is partially instrument specific Issues: Calibration stability Calibration data validation multiple correction methods relationship to acquired data Collaborators: instrument builders, other instruments Development Effort: 6 p-m Wavelength Calibration : conversion of pixel positions to wavelength scale calibration methodology is partially instrument specific Issues: Calibration stability Calibration accuracy multiple correction methods Collaborators: instrument builders, other instruments Development Effort: 3 p-m Coordinate System Definition : definition of heliocentric coordinate grid for data should cohere to FITS World Coordinate System standards may require information from facility systems Issues: coordinate conversions January 11, 2014! Page 17

18 mapping of irregularly gridded data coordinate system definition complexity Collaborators: instrument builders, William Thompson, SDO Software: WCSLIB, SolarSoft Development Effort: 8 p-m Polarization Calibration : Determine (combined) telescope and instrument matrices requires measurements and modeling of optical system may require information from facility systems Issues: required level of precision may be instrument or program dependent different contributions may be characterized at different times calibration procedure may be instrument-dependent Collaborators: instrument builders, SolarNet Software: instrument-specific software Development Effort: 6 p-m Deconvolution : Correct known static aberrations May require knowledge of instrument PSF and information from facility systems Issues: May not be required for all datasets Collaborators: instrument builders Software: R-L deconvolution, other Development Effort: 3 p-m Atmospheric corrections Photometric Calibration : conversion of measured DN to photon or physical flux units may require information from facility systems primary goal is relative flux calibration Issues: Calibration requirements? multiple correction methods; instrument dependent Scattered light corrections? Collaborators: instrument builders, other instruments Development Effort: 2 p-m Destretching : subfield remapping of distorted images Issues: improved error handling new implementation, multi-threaded optimization of processing parameters Collaborators: January 11, 2014! Page 18

19 Software: reg.pro Development Effort: p-m Speckle Interferometry : image reconstruction of image sequences Issues: identify failed reconstructions optimization of processing parameters processing requirements at data center AO input Collaborators: VBI+DHS, SolarNet Software: KISIP v6 Development Effort: 4 p-m Blind Deconvolution : image reconstruction of image sequences Issues: identify failed reconstructions optimization of processing parameters processing requirements at data center combination of MFBD and speckle processing Collaborators: UiO, SolarNet, MHPCC Software: MOMFBD Development Effort: 8 p-m Long-Exposure Deconvolution : image reconstruction of image sequences Issues: identify suitable datasets optimization of processing parameters processing requirements at data center Collaborators: NSO, KIS? Software: Jose Marino Development Effort: 6 p-m Spectral Extraction Purpose: measure specific predefined parameters from spectral profiles Source: instrument builders; existing instrument users High-level software packages: Line Fitting : determine model independent parameters of line profile includes moments, line-minimum position, line widths, etc. Issues: Identify suitable/unsuitable profiles Error determination Both absorption and emission profiles Collaborators: Development Effort: 3 p-m Principle Component Analysis: classify spectral profiles based on orthogonal decomposition useful both for data discovery, reduction, and further scientific analysis January 11, 2014! Page 19

20 Spectral Inversion: Issues: Computational requirements Choice of orthogonal functions Collaborators: HAO Development Effort: 3 p-m Purpose: determine physical conditions giving rise to observed line profiles through comparison with model atmosphere profiles Source: Community packages High-level software packages: Line Inversion : Milne-Eddington inversion Issues: applicability to different measurements multi-line observations Collaborators: HAO, SDO/HMI Software: SIR, MERLIN, MISMA, HAZEL, Nicole, etc. Development Effort: n p-m Feature Recognition: Purpose: use image recognition techniques to autonomously identify and characterize solar features present in datasets. Collaborators: HEK, SIPWork, SDO FFT High-level software packages: Feature Identification: Many - TBD Issues: robustness in face of (residual) atmospheric distortions multi-line observations Collaborators: USC, MSU, NSO, others Software: Development Effort: n p-m Feature Tracking: Purpose: To use object tracking algorithms to follow features in temporal sequences Collaborators: USC High-level software packages: Motion Tracking : Many - TBD Issues: robustness in face of (residual) atmospheric distortions multi-line observations Collaborators: USC, MSU Software: Development Effort: n p-m January 11, 2014! Page 20

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

A Comparative TCO Study: VTLs and Physical Tape. With a Focus on Deduplication and LTO-5 Technology

A Comparative TCO Study: VTLs and Physical Tape. With a Focus on Deduplication and LTO-5 Technology White Paper A Comparative TCO Study: VTLs and Physical Tape With a Focus on Deduplication and LTO-5 Technology By Mark Peters February, 2011 This ESG White Paper is distributed under license from ESG.

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

Applications of LTFS for Cloud Storage Use Cases

Applications of LTFS for Cloud Storage Use Cases Applications of LTFS for Cloud Storage Use Cases Version 1.0 Publication of this SNIA Technical Proposal has been approved by the SNIA. This document represents a stable proposal for use as agreed upon

More information

The Key Elements of Digital Asset Management

The Key Elements of Digital Asset Management The Key Elements of Digital Asset Management The last decade has seen an enormous growth in the amount of digital content, stored on both public and private computer systems. This content ranges from professionally

More information

Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data

Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White

More information

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

Total Cost of Ownership Analysis

Total Cost of Ownership Analysis Total Cost of Ownership Analysis Abstract A total cost of ownership (TCO) analysis can measure the cost of acquiring and operating a new technology solution against a current installation. In the late

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON UNDERSTANDING DATA DEDUPLICATION Thomas Rivera SEPATON SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

Deduplication has been around for several

Deduplication has been around for several Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding

More information

Technology Insight Series

Technology Insight Series HP s Information Supply Chain Optimizing Information, Data and Storage for Business Value John Webster August, 2011 Technology Insight Series Evaluator Group Copyright 2011 Evaluator Group, Inc. All rights

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Sponsored by: Prepared by: Eric Slack, Sr. Analyst May 2012 Storage Infrastructures for Big Data Workflows Introduction Big

More information

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System Sponsored by: HP Noemi Greyzdorf November 2008 Robert Amatruda INTRODUCTION Global Headquarters:

More information

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,

More information

The THREDDS Data Repository: for Long Term Data Storage and Access

The THREDDS Data Repository: for Long Term Data Storage and Access 8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing

More information

Technical White Paper for the Oceanspace VTL6000

Technical White Paper for the Oceanspace VTL6000 Document No. Technical White Paper for the Oceanspace VTL6000 Issue V2.1 Date 2010-05-18 Huawei Symantec Technologies Co., Ltd. Copyright Huawei Symantec Technologies Co., Ltd. 2010. All rights reserved.

More information

Defect Tracking Best Practices

Defect Tracking Best Practices Defect Tracking Best Practices Abstract: Whether an organization is developing a new system or maintaining an existing system, implementing best practices in the defect tracking and management processes

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

Storage Technologies for Video Surveillance

Storage Technologies for Video Surveillance The surveillance industry continues to transition from analog to digital. This transition is taking place on two fronts how the images are captured and how they are stored. The way surveillance images

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting Information in a Smarter Data Center with the Performance of Flash 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in

More information

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection Solution Brief Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection 2 Unitrends has leveraged over 20 years of experience in understanding ever-changing data protection challenges in

More information

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio

More information

The Future of Archive Storage

The Future of Archive Storage Long Term Preservation: The Future of Archive Storage Chris Powers Vice President, HP Storage Change is Constant data growth technologies High Performance long term preservation Near-Line Archive Flash

More information

Taming Big Data Storage with Crossroads Systems StrongBox

Taming Big Data Storage with Crossroads Systems StrongBox BRAD JOHNS CONSULTING L.L.C Taming Big Data Storage with Crossroads Systems StrongBox Sponsored by Crossroads Systems 2013 Brad Johns Consulting L.L.C Table of Contents Taming Big Data Storage with Crossroads

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication September 2002 IBM Storage Products Division Raleigh, NC http://www.storage.ibm.com Table of contents Introduction... 3 Key

More information

ioscale: The Holy Grail for Hyperscale

ioscale: The Holy Grail for Hyperscale ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive Using XenData Software and a Spectra Logic Archive With the Video Edition of XenData Archive Series software on a Windows server and a Spectra Logic T-Series digital archive, broadcast organizations have

More information

Accounts Payable Imaging & Workflow Automation. In-House Systems vs. Software-as-a-Service Solutions. Cost & Risk Analysis

Accounts Payable Imaging & Workflow Automation. In-House Systems vs. Software-as-a-Service Solutions. Cost & Risk Analysis In-House Systems vs. Software-as-a-Service Solutions Cost & Risk Analysis What is Imaging & Workflow Automation? Imaging and Workflow Automation (IWA) solutions streamline the invoice receipt-to-pay cycle

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives WHITE PAPER QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives CONTENTS Executive Summary....................................................................3 The Limits of Traditional

More information

Knowledge Base Data Warehouse Methodology

Knowledge Base Data Warehouse Methodology Knowledge Base Data Warehouse Methodology Knowledge Base's data warehousing services can help the client with all phases of understanding, designing, implementing, and maintaining a data warehouse. This

More information

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER Transform Oil and Gas WHITE PAPER TABLE OF CONTENTS Overview Four Ways to Accelerate the Acquisition of Remote Sensing Data Maximize HPC Utilization Simplify and Optimize Data Distribution Improve Business

More information

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving November, 2013 Saqib Jang Abstract This white paper demonstrates how to increase profitability by reducing the operating costs of backup

More information

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Centralized data management for remote and branch office (Robo) environments

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Centralized data management for remote and branch office (Robo) environments Dell PowerVault DL Backup to Disk Appliance Powered by CommVault Centralized data management for remote and branch office (Robo) environments Contents Executive summary Return on investment of centralizing

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. UNDERSTANDING DATA DEDUPLICATION Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

STORNEXT PRO SOLUTIONS. StorNext Pro Solutions

STORNEXT PRO SOLUTIONS. StorNext Pro Solutions STORNEXT PRO SOLUTIONS StorNext Pro Solutions StorNext PRO SOLUTIONS StorNext Pro Solutions offer Post-Production and Broadcast Professionals the fastest, easiest, and most complete high-performance shared

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution Database Solutions Engineering By Subhashini Prem and Leena Kushwaha Dell Product Group March 2009 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

Protecting enterprise servers with StoreOnce and CommVault Simpana

Protecting enterprise servers with StoreOnce and CommVault Simpana Technical white paper Protecting enterprise servers with StoreOnce and CommVault Simpana HP StoreOnce Backup systems Table of contents Introduction 2 Technology overview 2 HP StoreOnce Backup systems key

More information

A View on the Future of Tape

A View on the Future of Tape R. Fontana, G. Decad IBM Systems September 10, 2015 A View on the Future of Tape 2015 IBM Corporation 1 A View on the Future of Tape TAPE, HDD, NAND Flash are alive and evolving Post Consumer Nature of

More information

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction There are tectonic changes to storage technology that the IT industry hasn t seen for many years. Storage has been

More information

Document Image Archive Transfer from DOS to UNIX

Document Image Archive Transfer from DOS to UNIX Document Image Archive Transfer from DOS to UNIX Susan E. Hauser, Michael J. Gill, George R. Thoma Lister Hill National Center for Biomedical Communications National Library of Medicine Bethesda, Maryland

More information

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Intro to Data Management Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Why Data Management? Digital research, above all, creates files Lots of files Without a plan,

More information

HP Smart Array Controllers and basic RAID performance factors

HP Smart Array Controllers and basic RAID performance factors Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

WHITE PAPER. Reinventing Large-Scale Digital Libraries With Object Storage Technology

WHITE PAPER. Reinventing Large-Scale Digital Libraries With Object Storage Technology WHITE PAPER Reinventing Large-Scale Digital Libraries With Object Storage Technology CONTENTS Introduction..........................................................................3 Hitting The Limits

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Introduction to Optical Archiving Library Solution for Long-term Data Retention

Introduction to Optical Archiving Library Solution for Long-term Data Retention Introduction to Optical Archiving Library Solution for Long-term Data Retention CUC Solutions & Hitachi-LG Data Storage 0 TABLE OF CONTENTS 1. Introduction... 2 1.1 Background and Underlying Requirements

More information

Implementing Offline Digital Video Storage using XenData Software

Implementing Offline Digital Video Storage using XenData Software using XenData Software XenData software manages data tape drives, optionally combined with a tape library, on a Windows Server 2003 platform to create an attractive offline storage solution for professional

More information

Leveraging Cloud Storage with SharePoint and StoragePoint

Leveraging Cloud Storage with SharePoint and StoragePoint Leveraging Cloud Storage with SharePoint and StoragePoint By Chris Geier Published: June 2010 CONTENTS OVERVIEW AND SCOPE...2 ABOUT STORAGEPOINT... 2 WHAT IS CLOUD COMPUTING?...2 SHAREPOINT IN THE CLOUD?...3

More information

Software Design Proposal Scientific Data Management System

Software Design Proposal Scientific Data Management System Software Design Proposal Scientific Data Management System Alex Fremier Associate Professor University of Idaho College of Natural Resources Colby Blair Computer Science Undergraduate University of Idaho

More information

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software The Video Edition of XenData Archive Series software manages one or more automated data tape libraries on

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

June 2009. Blade.org 2009 ALL RIGHTS RESERVED

June 2009. Blade.org 2009 ALL RIGHTS RESERVED Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS

More information

Optimizing Dell PowerEdge Configurations for Hadoop

Optimizing Dell PowerEdge Configurations for Hadoop Optimizing Dell PowerEdge Configurations for Hadoop Understanding how to get the most out of Hadoop running on Dell hardware A Dell technical white paper July 2013 Michael Pittaro Principal Architect,

More information

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk WHITE PAPER Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDisk Corporation. All rights reserved. www.sandisk.com Table of Contents Introduction

More information

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International Keys to Successfully Architecting your DSI9000 Virtual Tape Library By Chris Johnson Dynamic Solutions International July 2009 Section 1 Executive Summary Over the last twenty years the problem of data

More information

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała Data Deduplication in Tivoli Storage Manager Andrzej Bugowski 19-05-2011 Spała Agenda Tivoli Storage, IBM Software Group Deduplication concepts Data deduplication in TSM 6.1 Planning for data deduplication

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007 Tiered Data Protection Strategy Data Deduplication Thomas Störr Sales Director Central Europe November 8, 2007 Overland Storage Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion

More information

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved. Cost Effective Backup with Deduplication Agenda Today s Backup Challenges Benefits of Deduplication Source and Target Deduplication Introduction to EMC Backup Solutions Avamar, Disk Library, and NetWorker

More information

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Deploying a distributed data storage system on the UK National Grid Service using federated SRB Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications

More information

A New Data Visualization and Analysis Tool

A New Data Visualization and Analysis Tool Title: A New Data Visualization and Analysis Tool Author: Kern Date: 22 February 2013 NRAO Doc. #: Version: 1.0 A New Data Visualization and Analysis Tool PREPARED BY ORGANIZATION DATE Jeff Kern NRAO 22

More information

Dramatically Lowering Storage Costs

Dramatically Lowering Storage Costs Dramatically Lowering Storage Costs 2012 Peter McGonigal petermc@sgi.com James Hill jamesh@sgi.com Version 7 Analysts and observers of today's digital age are forecasting exponential growth of data volumes

More information

STORNEXT PRO SOLUTIONS. StorNext Pro Solutions

STORNEXT PRO SOLUTIONS. StorNext Pro Solutions STORNEXT PRO SOLUTIONS StorNext Pro Solutions StorNext PRO SOLUTIONS StorNext Pro Solutions offer post-production and broadcast professionals the fastest, easiest, and most complete high-performance shared

More information

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup EMC PERSPECTIVE An EMC Perspective on Data De-Duplication for Backup Abstract This paper explores the factors that are driving the need for de-duplication and the benefits of data de-duplication as a feature

More information

QUICK REFERENCE GUIDE: KEY FEATURES AND BENEFITS

QUICK REFERENCE GUIDE: KEY FEATURES AND BENEFITS QUICK REFERENCE GUIDE: FOR SMALL TO MEDIUM-SIZE BUSINESSES DISK-BASED BACKUP DXi4000 SERIES DEDUPLICATION APPLIANCES Patented data deduplication technology reduces disk requirements by 90% or more Scalable

More information

EX ECUT IV E ST RAT EG Y BR IE F. Server Design for Microsoft s Cloud Infrastructure. Cloud. Resources

EX ECUT IV E ST RAT EG Y BR IE F. Server Design for Microsoft s Cloud Infrastructure. Cloud. Resources EX ECUT IV E ST RAT EG Y BR IE F Server Design for Microsoft s Cloud Infrastructure Cloud Resources 01 Server Design for Cloud Infrastructure / Executive Strategy Brief Cloud-based applications are inherently

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated Taking Linux File and Storage Systems into the Future Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated 1 Overview Going Bigger Going Faster Support for New Hardware Current Areas

More information

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Backup and Recovery: The Benefits of Multiple Deduplication Policies Backup and Recovery: The Benefits of Multiple Deduplication Policies NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change

More information

Observing Data Quality Service Level Agreements: Inspection, Monitoring, and Tracking

Observing Data Quality Service Level Agreements: Inspection, Monitoring, and Tracking A DataFlux White Paper Prepared by: David Loshin Observing Data Quality Service Level Agreements: Inspection, Monitoring, and Tracking Leader in Data Quality and Data Integration www.dataflux.com 877 846

More information

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 Your Data, Any Place, Any Time Executive Summary: More than ever, organizations rely on data

More information

SmartSync Backup Efficient NAS-to-NAS backup

SmartSync Backup Efficient NAS-to-NAS backup Allion Ingrasys Europe SmartSync Backup Efficient NAS-to-NAS backup 1. Abstract A common approach to back up data stored in a NAS server is to run backup software on a Windows or UNIX systems and back

More information