UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
|
|
- Bernadette Pope
- 8 years ago
- Views:
Transcription
1 UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High Performance Computing (HPC) and Data services to the national research sector. On October 30, 2015 the Sigma2 board decided that the storage infrastructure should be based on a distributed data centre (DDC) model in which the storage resources are spread across two sites that appear and function as a single data infrastructure. Each of the two HPC systems that will be procured during will also be located in the same facilities as the storage resources. This comes at a cost increase due to two sites and an overhead on resource capacities, but will provide the benefit of dual site reliability. The user requirements suggest that data must be connected to resources that allow the users to process, analyse, share and publish the data. The data must be accessible in a simple and consistent manner that accommodates not only traditional compute intensive users, but also new user communities that are data driven. Infrastructures for research and larger laboratory facilities that generate large amount of data must be enabled to ingest their data via effective protocols that are supported by the data infrastructure. Data must be stored safely with a redundancy that will withstand failure of on any level (disk, server, rack, site), while at the same time avoiding unnecessary duplication of data based on project based policy. The requirements for a given research dataset may change during its lifetime and the infrastructure should enable seamless migration of data between the relevant storage devices based on data management policies and usage. The future national data infrastructure must be a scalable, reliable and flexible facility that can accommodate the vast majority of relevant users and communities. In the distributed infrastructure, the physical storage resources are located at separate geographical locations (hundreds to a thousand kms apart) and thus consist of at least two data centres. This infrastructure relies on replicating data between the two data centres with a redundancy that allows one of the data centres to be lost or become unavailable without losing access to the data as this is stored on the other site. This solution relies on an adequate network bandwidth between the data centres. Services are provided on all data centres and each of the HPC systems are directly connected to the corresponding data store to achieve a good integration among all infrastructure components. 1
2 A major challenge for any future data infrastructure is how to tackle the growth of data. The data does not only need to be stored safely, but also need to be connected to resources to enable services such as e.g. compute, visualization, analyses and other miscellaneous services. Data accessibility is therefore of key importance. Transporting data is already an increasing challenge with the growth of data exceeding the network capacity increments. Dedicated high bandwidth network links come at a significant cost. A data centric architecture is therefore an appealing concept that is well suited to meet the challenges of a future national e infrastructure for scientific research. Data Life Cycle Data is categorised in different classes that represent the relevant stages of the data life cycle. This cycle typically starts with the creation of data in lab/experiments or computing environments, encompasses an active phase in which the data is processed and prepared for interpretation and analysis. When the data has been thoroughly analysed it is expected that the data is archived and possibly published. The illustration shows the typical phases of the scientific process from a data life cycle point of view. 2
3 Data classification Currently research data are dispersed over four HPC systems and one data infrastructure in Norway. The goal is to consolidate these data in one infrastructure. On the HPC systems three different type of data stores are found; i) runtime data produced during computations ( /work), ii) user s own data on the system ( /home ) and iii) project data stored on dedicated disks ( /project ). The runtime data storage (i) has strict performance and integration requirements and will therefore be procured as part of the HPC system. The remaining data stores (ii+iii) will be hosted on the new data infrastructure, including non HPC /project data. The following table provides a description of the various data classes and suggesting what type of data may be found in the respective data classes. Class raw (/project) hot (/home, /project, /work) cold Description Data that is not reproducible and in its pristine state is considered raw. Such data is typically the result of recorded measurements by an instrument or experiment, typically requiring further processing to become meaningful. Raw data is static by definition and may be required in the future to enable reproduction of previous results or to verify/dismiss claimed errors. This class of data is therefore valuable and best practices often recommend that such data must be secured for the future. Data that is in active use and accessed frequently is dubbed hot. Hot data is typically accessed, processed and can serve as input to new calculations on an HPC system or analytics service for instance. It is therefore necessary to maintain this data on a storage technology with high read and write performance while at the same time coping with multiple users and process. The performance is determined by the connectivity between the storage system and the compute resources (such as HPC). The storage resource performance for the filesystem may be improved by use of fast storage mediums (SSDs) for caching data. Data that is still relevant, but accessed less frequently. Cold data should 3
4 (/project, /home) copy (/project/copy) published (not mounted) curated (not mounted) be accessible via various protocols (i.e. POSIX, S3, HTTP REST), but can be stored on consumer hardware (i.e. SATA drives). Data that serves as a (backup) copy only. This is a subclass of cold data and it is only accessed in the event of the (external) data becoming corrupted. Data of this type must be stored with a reference checksum value and can be stored on very cost effective high density disks such as Shingled Magnetic Recording disks Data that is archived and published by issuing a DOI. Such data is typically archived for several years, but with no curation requirements. It is expected that the data is no longer useful after a decade and can, in principle, be deleted after such time. Published data that has a (documented) need for long term preservation and permanent storage requirement must be curated by a data librarian and set up with a preservation plan. Infrastructure configuration To safeguard against loss of data in the event of catastrophic events such as a fire or flooding it is necessary to have data redundancy between two sites. In this way one data centre site can be lost due to a fire or flooding while the other site will still provide access to the data and services. In the unlikely event of such a scenario the data redundancy must be restored within the remaining site (provided there is sufficient resources available) or to a third (backup) site. A national data infrastructure will therefore have a minimum of two sites for reliability and availability reasons. Below we describe a possible architecture to achieve this in two sites scenario. Distributed Data Centre In the Distributed Data Centre (DDC) configuration, the two sites are geographically separated by distances of typically hundreds of kilometers. This means that two data centres are required and the interconnect between the two data centres will rely on Wide Area Network (WAN), either connected via the national research network (Forskningsnettet) or a dedicated fiber. Latency limits the performance of synchronous data replication over such distances and it is necessary to rely on asynchronous replication between sites. Within a site this configuration retains the performance between the compute services e.g. HPC, data analytics, visualization services and the stored data, but the data replication between site A and B can/will be asynchronous (data sync is not achieved/guaranteed at all times). 4
5 This configuration has the benefit that it resists catastrophic events that would take out one entire data centre, while at the same time maintaining all data intact on the remaining site. It does require a higher degree of storage redundancy. User Requirements UNINETT Sigma2 completed a user survey in June, 2015 to get current and future requirements from scientific communities in Norway. The result from this survey shows a strong dependence on the infrastructure and frequent use of the key NorStore data storage services such as the project area, NorStore archive and services for sensitive data. Considering these use cases, users have requested the data to be kept for three (3) months to several years. Users are currently required to use the traditional SSH/SFTP based tools to access their data and the available resources. There is a demand for more user friendly and flexible services, e.g. Desktop client using WebDAV/SMB or Dropbox like Sync n Share to interact with data services. Currently when users need to process data stored in the national infrastructure, they are required to copy this data manually to the HPC project space. This results in duplication of data as well as limiting the users to the storage capacity available on the HPC facility. Moreover, it results in data getting out of sync due to manual copy and modification which further results in bad user experiences. In the user survey, users have asked that they would like to have data directly accessible on the HPC system to process it. This will result in a better utilization of storage resources by avoiding duplication and offering a smoother user experience. Finally, there is an increasing need for storage and data management services for non HPC users, and 5
6 in particular shared project areas with fine grained access control, metadata and publishing services. In addition to the current data services, users have expressed interest in dedicated (compute) resources, data analytics and visualisation services. Dedicated resources enable users to have a certain amount of reserved compute and storage to perform certain tasks with short notice and high priority. The compute and storage resources would be permanently reserved for the duration of resources allocated to a user. The data analytics service is about analysing big data using frameworks such as e.g. Apache Spark/Hadoop. The visualization service offers a service to remotely visualize large datasets using dedicated hardware e.g. GPUs. In a data centric infrastructure, laboratories and research infrastructures generating large or steady streams of data should be able to store and use the national data infrastructure. Research institutions can benefit from permanently connecting their resources to the national data infrastructure via suitable protocols such as i.e. S3, REST or Sync n share (Dropbox like). Examples of such facilities may be genome sequencers or other high data volume research instruments. 6
7 A Service Oriented National e Infrastructure The HPC system requires a fast and low latency scratch storage space, which is usually available under /work. This storage space will be part of a separate storage system and is procured with the HPC system. In addition to scratch space, an HPC system requires access to users /home directory and project data storage. The /home storage represents the space allocated to users to store their code or configuration data. The project storage represents the space allocated to large data sets e.g. databases of genomes, reference data etc. This space is usually shared among a group of users. Access to these data stores should be provided to the HPC users using a parallel POSIX file system and/or HTTP REST based protocols. Access to data is not only to compute intensive services such as HPC, but also needed for data intensive services. These are services where the number of operations per byte is very small, like scaling all entries in a file, changing format on an image, a genome, or transcoding a video etc. Other data centric services include visualisation and animations. This often require large datasets to be processed to make images that are either displayed or sequenced to make animations. With the recent increase in amount of unstructured/machine generated data, many open source frameworks has been developed to process large data sets in parallel running on commodity hardware. Such processing requires access to vast quantities of data collected from different sources like sensor arrays, data from the internet and genomic data. The frameworks, e.g. Apache Spark/Hadoop, process such datasets in a distributed fault tolerance way and enable analytics at large scale. Currently a national sensitive data service is provided by Univ. of Oslo (USIT). The sensitive data services is an important part of the national services and a significant number of users within the fields of medicine and life science rely on this service today. UNINETT Sigma2 AS has supported the development and is currently offering resources to this service from the national storage resource pool. It may be challenging to migrate the service during the first year of operation of the new infrastructure and it is therefore likely that the service will continue to be provided as it is today until at least The storage infrastructure should be able to serve these needs for the various national services and satisfy performance requirements by combining different storage mediums e.g SSDs and SATA. Storage Requirements for Services Taking the user requirements for current and future services into consideration, we require the storage solution to be scalable, reliable, flexible and policy driven. The focus of the storage system is to be an enabler of new services and allow users to interact with the national e infrastructure in an intuitive and flexible way. The storage system should provide a global 7
8 name space and enabling users to access their data from any resource/site. The storage system should support the use of different storage technologies to achieve high performance (read+write) and balance between low cost capacity storage. Depending on the access pattern of specific data objects the system should automatically move the data between performance and capacity storage pools; e.g. moving data that is infrequently modified / accessed from the performance storage pool to the more cost effective capacity storage pool (e.g. erasure coded storage on SATA disks). Data from the storage system should be accessible on all the national services. The users should be able to deposit data in the storage system using different protocols as mentioned in the table below. Once data is received in the storage system, users can access this data from the HPC or any other service offered by the national e infrastructure. The storage system should be free from single point of failures (SPOFs), mainly by ensuring component/data redundancy. The system should be able to operate normally under disk/controller/server/rack/site failure given enough free capacity. To provide redundancy in case of fire or natural disaster, the storage system needs to support geo replication across a minimum of two sites. The replication can be performed asynchronously to avoid the performance penalty and reducing the requirement for large network bandwidth between two sites. Protocols for Interacting with Data Storage Resources The table below lists the expected relevant protocols for various key services. User data deposit/access HPC Data Access Data Analytics Dedicated Resources Visualisation Sync n Share (WebDAV or similar), POSIX, SSH/SFTP, Object based HTTP REST API (S3) Parallel POSIX compliant filesystem, Object based HTTP REST API (S3) Parallel POSIX/Hadoop compliant file system, Object based HTTP REST API (S3) Block storage for virtual machines, POSIX/SMB/CIFS file system access Parallel POSIX compliant file system Backup service It is necessary to offer a backup service that can secure the history (changes and deletions) in /home area. It will also be required to provide a form of backup service for the /project data by snapshots locally on the storage system or some other form of undelete functionality. The 8
9 backup service should be cost effective to be feasible for multi petabyte storage. The vendors should mention the support for various backup softwares. 9
Amazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
More informationDiagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
More informationWestek Technology Snapshot and HA iscsi Replication Suite
Westek Technology Snapshot and HA iscsi Replication Suite Westek s Power iscsi models have feature options to provide both time stamped snapshots of your data; and real time block level data replication
More informationDesigning a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
More informationSTORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER
STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead
More informationApplication Brief: Using Titan for MS SQL
Application Brief: Using Titan for MS Abstract Businesses rely heavily on databases for day-today transactions and for business decision systems. In today s information age, databases form the critical
More informationJune 2009. Blade.org 2009 ALL RIGHTS RESERVED
Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS
More informationTechnology Insight Series
HP s Information Supply Chain Optimizing Information, Data and Storage for Business Value John Webster August, 2011 Technology Insight Series Evaluator Group Copyright 2011 Evaluator Group, Inc. All rights
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationWith DDN Big Data Storage
DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationClodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage
Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is
More informationAUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK
White Paper AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK Abstract EMC Isilon SmartLock protects critical data against accidental, malicious or premature deletion or alteration. Whether you need to
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationTechnology Insight Series
Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary
More informationMaxta Storage Platform Enterprise Storage Re-defined
Maxta Storage Platform Enterprise Storage Re-defined WHITE PAPER Software-Defined Data Center The Software-Defined Data Center (SDDC) is a unified data center platform that delivers converged computing,
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationProtecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering
Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution Database Solutions Engineering By Subhashini Prem and Leena Kushwaha Dell Product Group March 2009 THIS WHITE PAPER IS FOR INFORMATIONAL
More informationBlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
More informationCloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationCSE-E5430 Scalable Cloud Computing P Lecture 5
CSE-E5430 Scalable Cloud Computing P Lecture 5 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 12.10-2015 1/34 Fault Tolerance Strategies for Storage
More informationBusiness-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationSQL Server Storage Best Practice Discussion Dell EqualLogic
SQL Server Storage Best Practice Discussion Dell EqualLogic What s keeping you up at night? Managing the demands of a SQL environment Risk Cost Data loss Application unavailability Data growth SQL Server
More informationOPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver
More informationData Centric Computing Revisited
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
More informationBusiness Continuity: Choosing the Right Technology Solution
Business Continuity: Choosing the Right Technology Solution Table of Contents Introduction 3 What are the Options? 3 How to Assess Solutions 6 What to Look for in a Solution 8 Final Thoughts 9 About Neverfail
More informationEMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE
EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE INTRODUCTION IT organizations today grapple with two critical data storage challenges: the exponential growth of data and an increasing need to keep more data for
More informationTechnical Brief: Global File Locking
Nasuni enables collaboration among users of files no matter where users are located Introduction The Nasuni Service combines the availability and scale of cloud storage with the local performance found
More informationETERNUS CS High End Unified Data Protection
ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too
More informationNeverfail for Windows Applications June 2010
Neverfail for Windows Applications June 2010 Neverfail, from Neverfail Ltd. (www.neverfailgroup.com), ensures continuity of user services provided by Microsoft Windows applications via data replication
More informationWHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationData Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software
Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication September 2002 IBM Storage Products Division Raleigh, NC http://www.storage.ibm.com Table of contents Introduction... 3 Key
More informationIBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.
IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise
More informationBackup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or
11/09/2015 Backup Appliance or Backup Software? Article on things to consider when looking for a backup solution. Ray Quattromini FORTUNA POWER SYSTEMS LTD T: 01256 782030 E: RAY@FORTUNADATA.COM W: WWW.FORTUNADATA.COM
More informationIntroduction. Setup of Exchange in a VM. VMware Infrastructure
Introduction VMware Infrastructure is deployed in data centers for deploying mission critical applications. Deployment of Microsoft Exchange is a very important task for the IT staff. Email system is an
More informationAugust 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution
August 2009 Transforming your Information Infrastructure with IBM s Storage Cloud Solution Page 2 Table of Contents Executive summary... 3 Introduction... 4 A Story or three for inspiration... 6 Oops,
More informationStorage Technologies for Video Surveillance
The surveillance industry continues to transition from analog to digital. This transition is taking place on two fronts how the images are captured and how they are stored. The way surveillance images
More informationNetApp Big Content Solutions: Agile Infrastructure for Big Data
White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data
More informationLeveraging Virtualization for Disaster Recovery in Your Growing Business
Leveraging Virtualization for Disaster Recovery in Your Growing Business Contents What is Disaster Recovery?..................................... 2 Leveraging Virtualization to Significantly Improve Disaster
More informationAutomated file management with IBM Active Cloud Engine
Automated file management with IBM Active Cloud Engine Redefining what it means to deliver the right data to the right place at the right time Highlights Enable ubiquitous access to files from across the
More informationBuilding Storage Clouds for Online Applications A Case for Optimized Object Storage
Building Storage Clouds for Online Applications A Case for Optimized Object Storage Agenda Introduction: storage facts and trends Call for more online storage! AmpliStor: Optimized Object Storage Cost
More informationDisaster Recovery Checklist Disaster Recovery Plan for <System One>
Disaster Recovery Plan for SYSTEM OVERVIEW PRODUCTION SERVER HOT SITE SERVER APPLICATIONS (Use bold for Hot Site) ASSOCIATED SERVERS KEY CONTACTS Hardware Vendor System Owners Database Owner
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationHigh availability and disaster recovery with Microsoft, Citrix and HP
High availability and disaster recovery White Paper High availability and disaster recovery with Microsoft, Citrix and HP Using virtualization, automation and next-generation storage to improve business
More informationDisaster Recovery for Oracle Database
Disaster Recovery for Oracle Database Zero Data Loss Recovery Appliance, Active Data Guard and Oracle GoldenGate ORACLE WHITE PAPER APRIL 2015 Overview Oracle Database provides three different approaches
More informationDDN updates object storage platform as it aims to break out of HPC niche
DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently
More informationImplementing Disaster Recovery? At What Cost?
Implementing Disaster Recovery? At What Cost? Whitepaper Viktor Babkov Technical Director Business Continuity Copyright Business Continuity May 2010 In today s environment, minimizing IT downtime has become
More informationHigh Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
More informationCorporate PC Backup - Best Practices
A Druva Whitepaper Corporate PC Backup - Best Practices This whitepaper explains best practices for successfully implementing laptop backup for corporate workforce. White Paper WP /100 /009 Oct 10 Table
More informationBACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT
TADCASTER GRAMMAR SCHOOL Toulston, Tadcaster, North Yorkshire. LS24 9NB BACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT Written by Steve South November 2003 Discussed with ICT Strategy Group January
More informationDataCentred Cloud Storage
Service Description DataCentred Michigan Park Michigan Avenue Salford Quays M50 2GY United Kingdom Tel: 0161 870 3981 enquiries@datacentred.co.uk www.datacentred.co.uk Contents Service Description... 2
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More informationVersity 2013. All rights reserved.
From the only independent developer of large scale archival storage systems, the Versity Storage Manager brings enterpriseclass storage virtualization to the Linux platform. Based on Open Source technology,
More informationIBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationIBM ELASTIC STORAGE SEAN LEE
IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server
More informationDisaster Recovery Strategies: Business Continuity through Remote Backup Replication
W H I T E P A P E R S O L U T I O N : D I S A S T E R R E C O V E R Y T E C H N O L O G Y : R E M O T E R E P L I C A T I O N Disaster Recovery Strategies: Business Continuity through Remote Backup Replication
More informationObject Oriented Storage and the End of File-Level Restores
Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationZadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
More informationEMC DATA DOMAIN OPERATING SYSTEM
ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read
More informationThe Microsoft Large Mailbox Vision
WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more e mail has many advantages. Large mailboxes
More informationUniFS A True Global File System
UniFS A True Global File System Introduction The traditional means to protect file data by making copies, combined with the need to provide access to shared data from multiple locations, has created an
More informationEvery organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its
DISASTER RECOVERY STRATEGIES: BUSINESS CONTINUITY THROUGH REMOTE BACKUP REPLICATION Every organization has critical data that it can t live without. When a disaster strikes, how long can your business
More informationAttunity RepliWeb Event Driven Jobs
Attunity RepliWeb Event Driven Jobs Software Version 5.2 June 25, 2012 RepliWeb, Inc., 6441 Lyons Road, Coconut Creek, FL 33073 Tel: (954) 946-2274, Fax: (954) 337-6424 E-mail: info@repliweb.com, Support:
More informationEMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
More informationJournal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
More informationAdvanced Knowledge and Understanding of Industrial Data Storage
Dec. 3 rd 2013 Advanced Knowledge and Understanding of Industrial Data Storage By Jesse Chuang, Senior Software Manager, Advantech With the popularity of computers and networks, most enterprises and organizations
More informationGoogle File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
More informationBackup and Recovery 1
Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup
More informationHow to Choose your Red Hat Enterprise Linux Filesystem
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
More informationActifio Big Data Director. Virtual Data Pipeline for Unstructured Data
Actifio Big Data Director Virtual Data Pipeline for Unstructured Data Contact Actifio Support As an Actifio customer, you can get support for all Actifio products through the Support Portal at http://support.actifio.com/.
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationKeys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International
Keys to Successfully Architecting your DSI9000 Virtual Tape Library By Chris Johnson Dynamic Solutions International July 2009 Section 1 Executive Summary Over the last twenty years the problem of data
More informationSURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
More informationT a c k l i ng Big Data w i th High-Performance
Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationOracle Database 10g: Backup and Recovery 1-2
Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed
More informationDisk-to-Disk-to-Offsite Backups for SMBs with Retrospect
Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Abstract Retrospect backup and recovery software provides a quick, reliable, easy-to-manage disk-to-disk-to-offsite backup solution for SMBs. Use
More informationIT Service Management
IT Service Management Service Continuity Methods (Disaster Recovery Planning) White Paper Prepared by: Rick Leopoldi May 25, 2002 Copyright 2001. All rights reserved. Duplication of this document or extraction
More informationInformix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11
Informix Dynamic Server May 2007 Availability Solutions with Informix Dynamic Server 11 1 Availability Solutions with IBM Informix Dynamic Server 11.10 Madison Pruet Ajay Gupta The addition of Multi-node
More informationThe Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
More informationSolution Brief: Creating Avid Project Archives
Solution Brief: Creating Avid Project Archives Marquis Project Parking running on a XenData Archive Server provides Fast and Reliable Archiving to LTO or Sony Optical Disc Archive Cartridges Summary Avid
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationTABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models
1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint
More informationAnalyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and
More information(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
More informationUsing Live Sync to Support Disaster Recovery
Using Live Sync to Support Disaster Recovery SIMPANA VIRTUAL SERVER AGENT FOR VMWARE Live Sync uses backup data to create and maintain a warm disaster recovery site. With backup and replication from a
More informationMigration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module
Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module WHITE PAPER May 2015 Contents Advantages of NEC / Iron Mountain National
More informationCHAPTER 7 SUMMARY AND CONCLUSION
179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel
More informationHuawei OceanStor Backup Software Technical White Paper for NetBackup
Huawei OceanStor Backup Software Technical White Paper for NetBackup Huawei Page 1 of 14 Copyright Huawei. 2014. All rights reserved. No part of this document may be reproduced or transmitted in any form
More informationVMware VDR and Cloud Storage: A Winning Backup/DR Combination
VMware VDR and Cloud Storage: A Winning Backup/DR Combination 7/29/2010 CloudArray, from TwinStrata, and VMware Data Recovery combine to provide simple, fast and secure backup: On-site and Off-site The
More informationFour Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER
Transform Oil and Gas WHITE PAPER TABLE OF CONTENTS Overview Four Ways to Accelerate the Acquisition of Remote Sensing Data Maximize HPC Utilization Simplify and Optimize Data Distribution Improve Business
More informationPlanning and Implementing Disaster Recovery for DICOM Medical Images
Planning and Implementing Disaster Recovery for DICOM Medical Images A White Paper for Healthcare Imaging and IT Professionals I. Introduction It s a given - disaster will strike your medical imaging data
More information3Gen Data Deduplication Technical
3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and
More informationUsing In-Memory Data Grids for Global Data Integration
SCALEOUT SOFTWARE Using In-Memory Data Grids for Global Data Integration by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 B y enabling extremely fast and scalable data
More informationTop Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
More information