David Minor. Chronopolis Program Manager Director, Digital Preserva7on Ini7a7ves UCSD Library San Diego Supercomputer Center



Similar documents
U"lizing the SDSC Cloud Storage Service

Perspec'ves on SDN. Roadmap to SDN Workshop, LBL

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Redefining Oracle Database Management

Technical Overview Simple, Scalable, Object Storage Software

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins

StorReduce Technical White Paper Cloud-based Data Deduplication

HIGH-SPEED BRIDGE TO CLOUD STORAGE

Protect Data... in the Cloud

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Phone Systems Buyer s Guide

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Backing up to the Cloud

Archiving On-Premise and in the Cloud. March 2015

How To Image A Single Vm For Forensic Analysis On Vmwarehouse.Com

GPFS Cloud ILM. IBM Research - Zurich. Storage Research Technology Outlook

Data Centers and Cloud Computing. Data Centers

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers

Microsoft Azure Cloud on your terms. Start your cloud journey.

Data Storage Options for Research

HOSTWAY. FlexCloudTM. Servers

Deploying ArcGIS for Server using Managed Services

Research Data Storage, Sharing, and Transfer Options

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Wireless Networks: Network Protocols/Mobile IP

VoIP Security How to prevent eavesdropping on VoIP conversa8ons. Dmitry Dessiatnikov

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Navigating The World of Cloud Computing

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Service Description Cloud Storage Openstack Swift

Arista 7060X and 7260X series: Q&A

Introduction to OpenStack Swift CloudOpen Japan 2014

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

RECOVERY SCALABLE STORAGE

Auspex. NAS/SAN Integration

Web Drive Limited TERMS AND CONDITIONS FOR THE SUPPLY OF SERVER HOSTING

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Protecting enterprise servers with StoreOnce and CommVault Simpana

12 Key File Sync and Share Advantages of Transporter Over Box for Enterprise

Cloud Storage and Backup

owncloud Architecture Overview

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

Understanding AWS Storage Options

Solving today's challenges with Oracle SOA Suite, and Oracle Coherence

Data Loss Prevention (DLP) & Recovery Methodologies

Cisco Hybrid Cloud Solution: Deploy an E-Business Application with Cisco Intercloud Fabric for Business Reference Architecture

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

(Scale Out NAS System)

Restoration Technologies. Mike Fishman / EMC Corp.

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich

Computer Networks. Examples of network applica3ons. Applica3on Layer

An Introduction to Cloud Computing Concepts

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

WHITEPAPER: Understanding Pillar Axiom Data Protection Options

White Paper. What is IP SAN?

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Amazon Cloud Storage Options

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

High Availability Databases based on Oracle 10g RAC on Linux

Offensive & Defensive & Forensic Techniques for Determining Web User Iden<ty

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Transcription:

David Minor Chronopolis Program Manager Director, Digital Preserva7on Ini7a7ves UCSD Library San Diego Supercomputer Center

SDSC Cloud now in produc7on UCSD Library DAMS use of Cloud DuraCloud + SDSC Cloud DuraCloud + Chronopolis Digital Preserva7on Network

SDSC Cloud Storage Built on OpenStack swih object storage sohware. Supports Rackspace/ SwiH API and a subset of Amazon s S3. Ini7al 5.5PB Raw Storage, scalable to over 100PB with equal performance scaling. Supported by redundant Arista Networks 7508 switches, providing 768 total 10 gigabit (Gb) Ethernet ports for more than 10Tbit/s of non- blocking, IP- based connec7vity. Peak transfer rates of up to 8GB/sec. Direct connec7ons to CENIC, ESNet, and XSEDE networks provides high- bandwidth wide area connec7vity. Managed with Rocks cluster toolkit. SwiH Roll available soon. AES 256- bit server side encryp7on for at rest data is available and deployed in a local HIPAA environment. h`p://cloud.sdsc.edu

Service Architecture Object Storage Services Storage Customers TradiConal Clients GUI Applica7ons Command Line Tools Dropbox style website Web Services API Amazon S3, Rackspace Cloud Services Commercial Vendors Commvault Backups Or most Rackspace API Supported Apps.. Instantly Available to External Users Via HTTP and Other Cloud Users Load Balanced Proxy Servers Swi> Object Storage Cluster

Goals for SDSC Cloud Support NSF Data Management Plans Required Plan to describe how research results are shared. 99.5% system availability File replica7on automated Default 2 copies, able to keep addi7onal offsite replica7ons. Automated checksum verifica7on and error correc7on Scalable Performance and capacity grows by incremental bricks. Mul7faceted accessibility Web, API, Graphical and Command Line Clients h`p://cloud.sdsc.edu Cost compe77ve Operated as a recharge service On par with current tape- based dual- copy costs of $0.0325/GB/Mo.

UCSD Library DAMS use of Cloud (slides provided by Declan Fleming)

DAMS Storage Story Started with 4T on SRB 8 years ago Eventually moved everything to local SAMBA Grown to 16T now UCSD research data pilots required 100T+

DAMS Implementa7on RackSpace CloudFiles Java Library RESTian 13% faster for small files 17% faster for large files SAMBA vs. cloud upload and download speeds quite similar Colocated server at SDSC produced download speeds some7mes 50% faster with OpenStack

Large Files with OpenStack Files larger than 10GB revealed a bug OpenStack requires breaking files larger than 5GB into segments returns those segments in lexical sort order, rather than in crea7on order So if the segments are named 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, they will be returned in the order 1, 10, 2, 3, 4, 5, 6, 7, 8, 9 Padding the numbers "0001" to make sure they sort in numerical order fixed this problem Once dealt with, performance was independent of file size

OpenStack Storage Organiza7on Containers vs Subdirectories How many containers should we use? One per object? One container for all objects? Tested with 10000 TIFFS, 185G, from 1 to 10000 containers Sweet spot was around 100 containers Learned later that every container is a SQLite instance Also learned that some other services will limit the number of containers per user

Victory declared. Now star7ng process of moving dozens of terabytes of research data into cloud- hosted DAMS. Declan Fleming - dfleming@ucsd.edu

DuraCloud and SDSC Cloud

DuraCloud + SDSC Cloud SDSC Cloud now an op7on in DuraCloud Available in similar fashion as Amazon and Rackspace DuraCloud customers can now have a single service agreement and interface to access commercial and academic cloud providers

DuraCloud + SDSC Cloud "SDSC is an important strategic partner for us in that they are the first academic- based produc7on cloud offering and we envision together building an academic- based open cloud infrastructure with a layer of services provided by DuraCloud. We see this as the first step in this direc7on and are thrilled. - Michele Kimpton, DuraSpace CEO

DuraCloud + Chronopolis integra7on

Chronopolis + DuraCloud Chronopolis will be a preserva7on op7on within DuraCloud offerings Rolling into produc7on this fall On the backend, enabled through SDSC Cloud and related services

Process Data movement between systems happens in SDSC cloud Mediated by set of RESTful services Data handed off from DuraCloud with manifest, checksums, etc.

SDSC Cloud RESTful server SDSC Chronopolis storage NCAR UMD

Outcome Successful integra7on of two NDIIPP- supported preserva7on projects Chronopolis offers a TRAC- cer7fied archival backend to DuraCloud Chronopolis offers a low- cost snapshot of DuraCloud holdings DuraCloud offers a seamless op7on for orgs wan7ng data services and preserva7on Demonstra7on of uni7ng two systems via channels of communica7on and services

For more informa7on Andrew Woods, DuraCloud: awoods@duraspace.org David Minor, Mike Smorul, Chronopolis: minor@sdsc.edu msmorul@sesync.org Also will be presented as a poster at ipres