UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Size: px
Start display at page:

Download "UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure"

Transcription

1 UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High Performance Computing (HPC) and Data services to the national research sector. On October 30, 2015 the Sigma2 board decided that the storage infrastructure should be based on a distributed data centre (DDC) model in which the storage resources are spread across two sites that appear and function as a single data infrastructure. Each of the two HPC systems that will be procured during will also be located in the same facilities as the storage resources. This comes at a cost increase due to two sites and an overhead on resource capacities, but will provide the benefit of dual site reliability. The user requirements suggest that data must be connected to resources that allow the users to process, analyse, share and publish the data. The data must be accessible in a simple and consistent manner that accommodates not only traditional compute intensive users, but also new user communities that are data driven. Infrastructures for research and larger laboratory facilities that generate large amount of data must be enabled to ingest their data via effective protocols that are supported by the data infrastructure. Data must be stored safely with a redundancy that will withstand failure of on any level (disk, server, rack, site), while at the same time avoiding unnecessary duplication of data based on project based policy. The requirements for a given research dataset may change during its lifetime and the infrastructure should enable seamless migration of data between the relevant storage devices based on data management policies and usage. The future national data infrastructure must be a scalable, reliable and flexible facility that can accommodate the vast majority of relevant users and communities. In the distributed infrastructure, the physical storage resources are located at separate geographical locations (hundreds to a thousand kms apart) and thus consist of at least two data centres. This infrastructure relies on replicating data between the two data centres with a redundancy that allows one of the data centres to be lost or become unavailable without losing access to the data as this is stored on the other site. This solution relies on an adequate network bandwidth between the data centres. Services are provided on all data centres and each of the HPC systems are directly connected to the corresponding data store to achieve a good integration among all infrastructure components. 1

2 A major challenge for any future data infrastructure is how to tackle the growth of data. The data does not only need to be stored safely, but also need to be connected to resources to enable services such as e.g. compute, visualization, analyses and other miscellaneous services. Data accessibility is therefore of key importance. Transporting data is already an increasing challenge with the growth of data exceeding the network capacity increments. Dedicated high bandwidth network links come at a significant cost. A data centric architecture is therefore an appealing concept that is well suited to meet the challenges of a future national e infrastructure for scientific research. Data Life Cycle Data is categorised in different classes that represent the relevant stages of the data life cycle. This cycle typically starts with the creation of data in lab/experiments or computing environments, encompasses an active phase in which the data is processed and prepared for interpretation and analysis. When the data has been thoroughly analysed it is expected that the data is archived and possibly published. The illustration shows the typical phases of the scientific process from a data life cycle point of view. 2

3 Data classification Currently research data are dispersed over four HPC systems and one data infrastructure in Norway. The goal is to consolidate these data in one infrastructure. On the HPC systems three different type of data stores are found; i) runtime data produced during computations ( /work), ii) user s own data on the system ( /home ) and iii) project data stored on dedicated disks ( /project ). The runtime data storage (i) has strict performance and integration requirements and will therefore be procured as part of the HPC system. The remaining data stores (ii+iii) will be hosted on the new data infrastructure, including non HPC /project data. The following table provides a description of the various data classes and suggesting what type of data may be found in the respective data classes. Class raw (/project) hot (/home, /project, /work) cold Description Data that is not reproducible and in its pristine state is considered raw. Such data is typically the result of recorded measurements by an instrument or experiment, typically requiring further processing to become meaningful. Raw data is static by definition and may be required in the future to enable reproduction of previous results or to verify/dismiss claimed errors. This class of data is therefore valuable and best practices often recommend that such data must be secured for the future. Data that is in active use and accessed frequently is dubbed hot. Hot data is typically accessed, processed and can serve as input to new calculations on an HPC system or analytics service for instance. It is therefore necessary to maintain this data on a storage technology with high read and write performance while at the same time coping with multiple users and process. The performance is determined by the connectivity between the storage system and the compute resources (such as HPC). The storage resource performance for the filesystem may be improved by use of fast storage mediums (SSDs) for caching data. Data that is still relevant, but accessed less frequently. Cold data should 3

4 (/project, /home) copy (/project/copy) published (not mounted) curated (not mounted) be accessible via various protocols (i.e. POSIX, S3, HTTP REST), but can be stored on consumer hardware (i.e. SATA drives). Data that serves as a (backup) copy only. This is a subclass of cold data and it is only accessed in the event of the (external) data becoming corrupted. Data of this type must be stored with a reference checksum value and can be stored on very cost effective high density disks such as Shingled Magnetic Recording disks Data that is archived and published by issuing a DOI. Such data is typically archived for several years, but with no curation requirements. It is expected that the data is no longer useful after a decade and can, in principle, be deleted after such time. Published data that has a (documented) need for long term preservation and permanent storage requirement must be curated by a data librarian and set up with a preservation plan. Infrastructure configuration To safeguard against loss of data in the event of catastrophic events such as a fire or flooding it is necessary to have data redundancy between two sites. In this way one data centre site can be lost due to a fire or flooding while the other site will still provide access to the data and services. In the unlikely event of such a scenario the data redundancy must be restored within the remaining site (provided there is sufficient resources available) or to a third (backup) site. A national data infrastructure will therefore have a minimum of two sites for reliability and availability reasons. Below we describe a possible architecture to achieve this in two sites scenario. Distributed Data Centre In the Distributed Data Centre (DDC) configuration, the two sites are geographically separated by distances of typically hundreds of kilometers. This means that two data centres are required and the interconnect between the two data centres will rely on Wide Area Network (WAN), either connected via the national research network (Forskningsnettet) or a dedicated fiber. Latency limits the performance of synchronous data replication over such distances and it is necessary to rely on asynchronous replication between sites. Within a site this configuration retains the performance between the compute services e.g. HPC, data analytics, visualization services and the stored data, but the data replication between site A and B can/will be asynchronous (data sync is not achieved/guaranteed at all times). 4

5 This configuration has the benefit that it resists catastrophic events that would take out one entire data centre, while at the same time maintaining all data intact on the remaining site. It does require a higher degree of storage redundancy. User Requirements UNINETT Sigma2 completed a user survey in June, 2015 to get current and future requirements from scientific communities in Norway. The result from this survey shows a strong dependence on the infrastructure and frequent use of the key NorStore data storage services such as the project area, NorStore archive and services for sensitive data. Considering these use cases, users have requested the data to be kept for three (3) months to several years. Users are currently required to use the traditional SSH/SFTP based tools to access their data and the available resources. There is a demand for more user friendly and flexible services, e.g. Desktop client using WebDAV/SMB or Dropbox like Sync n Share to interact with data services. Currently when users need to process data stored in the national infrastructure, they are required to copy this data manually to the HPC project space. This results in duplication of data as well as limiting the users to the storage capacity available on the HPC facility. Moreover, it results in data getting out of sync due to manual copy and modification which further results in bad user experiences. In the user survey, users have asked that they would like to have data directly accessible on the HPC system to process it. This will result in a better utilization of storage resources by avoiding duplication and offering a smoother user experience. Finally, there is an increasing need for storage and data management services for non HPC users, and 5

6 in particular shared project areas with fine grained access control, metadata and publishing services. In addition to the current data services, users have expressed interest in dedicated (compute) resources, data analytics and visualisation services. Dedicated resources enable users to have a certain amount of reserved compute and storage to perform certain tasks with short notice and high priority. The compute and storage resources would be permanently reserved for the duration of resources allocated to a user. The data analytics service is about analysing big data using frameworks such as e.g. Apache Spark/Hadoop. The visualization service offers a service to remotely visualize large datasets using dedicated hardware e.g. GPUs. In a data centric infrastructure, laboratories and research infrastructures generating large or steady streams of data should be able to store and use the national data infrastructure. Research institutions can benefit from permanently connecting their resources to the national data infrastructure via suitable protocols such as i.e. S3, REST or Sync n share (Dropbox like). Examples of such facilities may be genome sequencers or other high data volume research instruments. 6

7 A Service Oriented National e Infrastructure The HPC system requires a fast and low latency scratch storage space, which is usually available under /work. This storage space will be part of a separate storage system and is procured with the HPC system. In addition to scratch space, an HPC system requires access to users /home directory and project data storage. The /home storage represents the space allocated to users to store their code or configuration data. The project storage represents the space allocated to large data sets e.g. databases of genomes, reference data etc. This space is usually shared among a group of users. Access to these data stores should be provided to the HPC users using a parallel POSIX file system and/or HTTP REST based protocols. Access to data is not only to compute intensive services such as HPC, but also needed for data intensive services. These are services where the number of operations per byte is very small, like scaling all entries in a file, changing format on an image, a genome, or transcoding a video etc. Other data centric services include visualisation and animations. This often require large datasets to be processed to make images that are either displayed or sequenced to make animations. With the recent increase in amount of unstructured/machine generated data, many open source frameworks has been developed to process large data sets in parallel running on commodity hardware. Such processing requires access to vast quantities of data collected from different sources like sensor arrays, data from the internet and genomic data. The frameworks, e.g. Apache Spark/Hadoop, process such datasets in a distributed fault tolerance way and enable analytics at large scale. Currently a national sensitive data service is provided by Univ. of Oslo (USIT). The sensitive data services is an important part of the national services and a significant number of users within the fields of medicine and life science rely on this service today. UNINETT Sigma2 AS has supported the development and is currently offering resources to this service from the national storage resource pool. It may be challenging to migrate the service during the first year of operation of the new infrastructure and it is therefore likely that the service will continue to be provided as it is today until at least The storage infrastructure should be able to serve these needs for the various national services and satisfy performance requirements by combining different storage mediums e.g SSDs and SATA. Storage Requirements for Services Taking the user requirements for current and future services into consideration, we require the storage solution to be scalable, reliable, flexible and policy driven. The focus of the storage system is to be an enabler of new services and allow users to interact with the national e infrastructure in an intuitive and flexible way. The storage system should provide a global 7

8 name space and enabling users to access their data from any resource/site. The storage system should support the use of different storage technologies to achieve high performance (read+write) and balance between low cost capacity storage. Depending on the access pattern of specific data objects the system should automatically move the data between performance and capacity storage pools; e.g. moving data that is infrequently modified / accessed from the performance storage pool to the more cost effective capacity storage pool (e.g. erasure coded storage on SATA disks). Data from the storage system should be accessible on all the national services. The users should be able to deposit data in the storage system using different protocols as mentioned in the table below. Once data is received in the storage system, users can access this data from the HPC or any other service offered by the national e infrastructure. The storage system should be free from single point of failures (SPOFs), mainly by ensuring component/data redundancy. The system should be able to operate normally under disk/controller/server/rack/site failure given enough free capacity. To provide redundancy in case of fire or natural disaster, the storage system needs to support geo replication across a minimum of two sites. The replication can be performed asynchronously to avoid the performance penalty and reducing the requirement for large network bandwidth between two sites. Protocols for Interacting with Data Storage Resources The table below lists the expected relevant protocols for various key services. User data deposit/access HPC Data Access Data Analytics Dedicated Resources Visualisation Sync n Share (WebDAV or similar), POSIX, SSH/SFTP, Object based HTTP REST API (S3) Parallel POSIX compliant filesystem, Object based HTTP REST API (S3) Parallel POSIX/Hadoop compliant file system, Object based HTTP REST API (S3) Block storage for virtual machines, POSIX/SMB/CIFS file system access Parallel POSIX compliant file system Backup service It is necessary to offer a backup service that can secure the history (changes and deletions) in /home area. It will also be required to provide a form of backup service for the /project data by snapshots locally on the storage system or some other form of undelete functionality. The 8

9 backup service should be cost effective to be feasible for multi petabyte storage. The vendors should mention the support for various backup softwares. 9

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

Westek Technology Snapshot and HA iscsi Replication Suite

Westek Technology Snapshot and HA iscsi Replication Suite Westek Technology Snapshot and HA iscsi Replication Suite Westek s Power iscsi models have feature options to provide both time stamped snapshots of your data; and real time block level data replication

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead

More information

Application Brief: Using Titan for MS SQL

Application Brief: Using Titan for MS SQL Application Brief: Using Titan for MS Abstract Businesses rely heavily on databases for day-today transactions and for business decision systems. In today s information age, databases form the critical

More information

June 2009. Blade.org 2009 ALL RIGHTS RESERVED

June 2009. Blade.org 2009 ALL RIGHTS RESERVED Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS

More information

Technology Insight Series

Technology Insight Series HP s Information Supply Chain Optimizing Information, Data and Storage for Business Value John Webster August, 2011 Technology Insight Series Evaluator Group Copyright 2011 Evaluator Group, Inc. All rights

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is

More information

AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK

AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK White Paper AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK Abstract EMC Isilon SmartLock protects critical data against accidental, malicious or premature deletion or alteration. Whether you need to

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Technology Insight Series

Technology Insight Series Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary

More information

Maxta Storage Platform Enterprise Storage Re-defined

Maxta Storage Platform Enterprise Storage Re-defined Maxta Storage Platform Enterprise Storage Re-defined WHITE PAPER Software-Defined Data Center The Software-Defined Data Center (SDDC) is a unified data center platform that delivers converged computing,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering

Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution. Database Solutions Engineering Protecting Microsoft SQL Server with an Integrated Dell / CommVault Solution Database Solutions Engineering By Subhashini Prem and Leena Kushwaha Dell Product Group March 2009 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

CSE-E5430 Scalable Cloud Computing P Lecture 5

CSE-E5430 Scalable Cloud Computing P Lecture 5 CSE-E5430 Scalable Cloud Computing P Lecture 5 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 12.10-2015 1/34 Fault Tolerance Strategies for Storage

More information

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

SQL Server Storage Best Practice Discussion Dell EqualLogic

SQL Server Storage Best Practice Discussion Dell EqualLogic SQL Server Storage Best Practice Discussion Dell EqualLogic What s keeping you up at night? Managing the demands of a SQL environment Risk Cost Data loss Application unavailability Data growth SQL Server

More information

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Business Continuity: Choosing the Right Technology Solution

Business Continuity: Choosing the Right Technology Solution Business Continuity: Choosing the Right Technology Solution Table of Contents Introduction 3 What are the Options? 3 How to Assess Solutions 6 What to Look for in a Solution 8 Final Thoughts 9 About Neverfail

More information

EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE

EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE INTRODUCTION IT organizations today grapple with two critical data storage challenges: the exponential growth of data and an increasing need to keep more data for

More information

Technical Brief: Global File Locking

Technical Brief: Global File Locking Nasuni enables collaboration among users of files no matter where users are located Introduction The Nasuni Service combines the availability and scale of cloud storage with the local performance found

More information

ETERNUS CS High End Unified Data Protection

ETERNUS CS High End Unified Data Protection ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too

More information

Neverfail for Windows Applications June 2010

Neverfail for Windows Applications June 2010 Neverfail for Windows Applications June 2010 Neverfail, from Neverfail Ltd. (www.neverfailgroup.com), ensures continuity of user services provided by Microsoft Windows applications via data replication

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication September 2002 IBM Storage Products Division Raleigh, NC http://www.storage.ibm.com Table of contents Introduction... 3 Key

More information

IBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.

IBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand. IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise

More information

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or 11/09/2015 Backup Appliance or Backup Software? Article on things to consider when looking for a backup solution. Ray Quattromini FORTUNA POWER SYSTEMS LTD T: 01256 782030 E: RAY@FORTUNADATA.COM W: WWW.FORTUNADATA.COM

More information

Introduction. Setup of Exchange in a VM. VMware Infrastructure

Introduction. Setup of Exchange in a VM. VMware Infrastructure Introduction VMware Infrastructure is deployed in data centers for deploying mission critical applications. Deployment of Microsoft Exchange is a very important task for the IT staff. Email system is an

More information

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution August 2009 Transforming your Information Infrastructure with IBM s Storage Cloud Solution Page 2 Table of Contents Executive summary... 3 Introduction... 4 A Story or three for inspiration... 6 Oops,

More information

Storage Technologies for Video Surveillance

Storage Technologies for Video Surveillance The surveillance industry continues to transition from analog to digital. This transition is taking place on two fronts how the images are captured and how they are stored. The way surveillance images

More information

NetApp Big Content Solutions: Agile Infrastructure for Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data

More information

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Leveraging Virtualization for Disaster Recovery in Your Growing Business Leveraging Virtualization for Disaster Recovery in Your Growing Business Contents What is Disaster Recovery?..................................... 2 Leveraging Virtualization to Significantly Improve Disaster

More information

Automated file management with IBM Active Cloud Engine

Automated file management with IBM Active Cloud Engine Automated file management with IBM Active Cloud Engine Redefining what it means to deliver the right data to the right place at the right time Highlights Enable ubiquitous access to files from across the

More information

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Building Storage Clouds for Online Applications A Case for Optimized Object Storage Building Storage Clouds for Online Applications A Case for Optimized Object Storage Agenda Introduction: storage facts and trends Call for more online storage! AmpliStor: Optimized Object Storage Cost

More information

Disaster Recovery Checklist Disaster Recovery Plan for <System One>

Disaster Recovery Checklist Disaster Recovery Plan for <System One> Disaster Recovery Plan for SYSTEM OVERVIEW PRODUCTION SERVER HOT SITE SERVER APPLICATIONS (Use bold for Hot Site) ASSOCIATED SERVERS KEY CONTACTS Hardware Vendor System Owners Database Owner

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

High availability and disaster recovery with Microsoft, Citrix and HP

High availability and disaster recovery with Microsoft, Citrix and HP High availability and disaster recovery White Paper High availability and disaster recovery with Microsoft, Citrix and HP Using virtualization, automation and next-generation storage to improve business

More information

Disaster Recovery for Oracle Database

Disaster Recovery for Oracle Database Disaster Recovery for Oracle Database Zero Data Loss Recovery Appliance, Active Data Guard and Oracle GoldenGate ORACLE WHITE PAPER APRIL 2015 Overview Oracle Database provides three different approaches

More information

DDN updates object storage platform as it aims to break out of HPC niche

DDN updates object storage platform as it aims to break out of HPC niche DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently

More information

Implementing Disaster Recovery? At What Cost?

Implementing Disaster Recovery? At What Cost? Implementing Disaster Recovery? At What Cost? Whitepaper Viktor Babkov Technical Director Business Continuity Copyright Business Continuity May 2010 In today s environment, minimizing IT downtime has become

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

Corporate PC Backup - Best Practices

Corporate PC Backup - Best Practices A Druva Whitepaper Corporate PC Backup - Best Practices This whitepaper explains best practices for successfully implementing laptop backup for corporate workforce. White Paper WP /100 /009 Oct 10 Table

More information

BACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT

BACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT TADCASTER GRAMMAR SCHOOL Toulston, Tadcaster, North Yorkshire. LS24 9NB BACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT Written by Steve South November 2003 Discussed with ICT Strategy Group January

More information

DataCentred Cloud Storage

DataCentred Cloud Storage Service Description DataCentred Michigan Park Michigan Avenue Salford Quays M50 2GY United Kingdom Tel: 0161 870 3981 enquiries@datacentred.co.uk www.datacentred.co.uk Contents Service Description... 2

More information

ZooKeeper. Table of contents

ZooKeeper. Table of contents by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...

More information

Versity 2013. All rights reserved.

Versity 2013. All rights reserved. From the only independent developer of large scale archival storage systems, the Versity Storage Manager brings enterpriseclass storage virtualization to the Linux platform. Based on Open Source technology,

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

IBM ELASTIC STORAGE SEAN LEE

IBM ELASTIC STORAGE SEAN LEE IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server

More information

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication W H I T E P A P E R S O L U T I O N : D I S A S T E R R E C O V E R Y T E C H N O L O G Y : R E M O T E R E P L I C A T I O N Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

More information

Object Oriented Storage and the End of File-Level Restores

Object Oriented Storage and the End of File-Level Restores Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

Zadara Storage Cloud A whitepaper. @ZadaraStorage

Zadara Storage Cloud A whitepaper. @ZadaraStorage Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information

The Microsoft Large Mailbox Vision

The Microsoft Large Mailbox Vision WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more e mail has many advantages. Large mailboxes

More information

UniFS A True Global File System

UniFS A True Global File System UniFS A True Global File System Introduction The traditional means to protect file data by making copies, combined with the need to provide access to shared data from multiple locations, has created an

More information

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its DISASTER RECOVERY STRATEGIES: BUSINESS CONTINUITY THROUGH REMOTE BACKUP REPLICATION Every organization has critical data that it can t live without. When a disaster strikes, how long can your business

More information

Attunity RepliWeb Event Driven Jobs

Attunity RepliWeb Event Driven Jobs Attunity RepliWeb Event Driven Jobs Software Version 5.2 June 25, 2012 RepliWeb, Inc., 6441 Lyons Road, Coconut Creek, FL 33073 Tel: (954) 946-2274, Fax: (954) 337-6424 E-mail: info@repliweb.com, Support:

More information

EMC XTREMIO EXECUTIVE OVERVIEW

EMC XTREMIO EXECUTIVE OVERVIEW EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Advanced Knowledge and Understanding of Industrial Data Storage

Advanced Knowledge and Understanding of Industrial Data Storage Dec. 3 rd 2013 Advanced Knowledge and Understanding of Industrial Data Storage By Jesse Chuang, Senior Software Manager, Advantech With the popularity of computers and networks, most enterprises and organizations

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data Actifio Big Data Director Virtual Data Pipeline for Unstructured Data Contact Actifio Support As an Actifio customer, you can get support for all Actifio products through the Support Portal at http://support.actifio.com/.

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International Keys to Successfully Architecting your DSI9000 Virtual Tape Library By Chris Johnson Dynamic Solutions International July 2009 Section 1 Executive Summary Over the last twenty years the problem of data

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Oracle Database 10g: Backup and Recovery 1-2

Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed

More information

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Abstract Retrospect backup and recovery software provides a quick, reliable, easy-to-manage disk-to-disk-to-offsite backup solution for SMBs. Use

More information

IT Service Management

IT Service Management IT Service Management Service Continuity Methods (Disaster Recovery Planning) White Paper Prepared by: Rick Leopoldi May 25, 2002 Copyright 2001. All rights reserved. Duplication of this document or extraction

More information

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11 Informix Dynamic Server May 2007 Availability Solutions with Informix Dynamic Server 11 1 Availability Solutions with IBM Informix Dynamic Server 11.10 Madison Pruet Ajay Gupta The addition of Multi-node

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Solution Brief: Creating Avid Project Archives

Solution Brief: Creating Avid Project Archives Solution Brief: Creating Avid Project Archives Marquis Project Parking running on a XenData Archive Server provides Fast and Reliable Archiving to LTO or Sony Optical Disc Archive Cartridges Summary Avid

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models 1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint

More information

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and

More information

(Scale Out NAS System)

(Scale Out NAS System) For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages

More information

Using Live Sync to Support Disaster Recovery

Using Live Sync to Support Disaster Recovery Using Live Sync to Support Disaster Recovery SIMPANA VIRTUAL SERVER AGENT FOR VMWARE Live Sync uses backup data to create and maintain a warm disaster recovery site. With backup and replication from a

More information

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module WHITE PAPER May 2015 Contents Advantages of NEC / Iron Mountain National

More information

CHAPTER 7 SUMMARY AND CONCLUSION

CHAPTER 7 SUMMARY AND CONCLUSION 179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel

More information

Huawei OceanStor Backup Software Technical White Paper for NetBackup

Huawei OceanStor Backup Software Technical White Paper for NetBackup Huawei OceanStor Backup Software Technical White Paper for NetBackup Huawei Page 1 of 14 Copyright Huawei. 2014. All rights reserved. No part of this document may be reproduced or transmitted in any form

More information

VMware VDR and Cloud Storage: A Winning Backup/DR Combination

VMware VDR and Cloud Storage: A Winning Backup/DR Combination VMware VDR and Cloud Storage: A Winning Backup/DR Combination 7/29/2010 CloudArray, from TwinStrata, and VMware Data Recovery combine to provide simple, fast and secure backup: On-site and Off-site The

More information

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER Transform Oil and Gas WHITE PAPER TABLE OF CONTENTS Overview Four Ways to Accelerate the Acquisition of Remote Sensing Data Maximize HPC Utilization Simplify and Optimize Data Distribution Improve Business

More information

Planning and Implementing Disaster Recovery for DICOM Medical Images

Planning and Implementing Disaster Recovery for DICOM Medical Images Planning and Implementing Disaster Recovery for DICOM Medical Images A White Paper for Healthcare Imaging and IT Professionals I. Introduction It s a given - disaster will strike your medical imaging data

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

Using In-Memory Data Grids for Global Data Integration

Using In-Memory Data Grids for Global Data Integration SCALEOUT SOFTWARE Using In-Memory Data Grids for Global Data Integration by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 B y enabling extremely fast and scalable data

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information