CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk



Similar documents
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

JASMIN Cloud ESGF and UV- CDAT Conference December 2014 STFC / Stephen Kill

JASMIN and CEMS: An Overview

JASMIN/CEMS Services and Facilities

JASMIN: the Joint Analysis System for big data. Bryan Lawrence

Parallel Processing using the LOTUS cluster

Outline. Introduction Virtualization Platform - Hypervisor High-level NAS Functions Applications Supported NAS models

White Paper. Recording Server Virtualization

HPC Advisory Council

vnas Series All-in-one NAS with virtualization platform

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed and Cloud Computing

CMIP6 Data Management at DKRZ

EMC BACKUP-AS-A-SERVICE

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Symantec Backup Exec 2014 TM Licensing Guide

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Emulex OneConnect 10GbE NICs The Right Solution for NAS Deployments

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Data Communications Hardware Data Communications Storage and Backup Data Communications Servers

(Scale Out NAS System)

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

CyberStore WSS. Multi Award Winning. Broadberry. CyberStore WSS. Windows Storage Server 2012 Appliances. Powering these organisations

Arkivum's Digital Archive Managed Service

Steven Newhouse, Head of Technical Services

Doubling the I/O Performance of VMware vsphere 4.1

Performance Testing of a Cloud Service

Addendum No. 1 to Packet No Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department

Seagate Cloud Systems & Solutions

10th TF-Storage Meeting

June Blade.org 2009 ALL RIGHTS RESERVED

NET ACCESS VOICE PRIVATE CLOUD

HPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor

Table of Contents. Server Virtualization Peer Review cameron : modified, cameron

Low-cost BYO Mass Storage Project. James Cizek Unix Systems Manager Academic Computing and Networking Services

SMB Direct for SQL Server and Private Cloud

Microsoft Hyper-V chose a Primary Server Virtualization Platform

NOTICE ADDENDUM NO. TWO (2) JULY 8, 2011 CITY OF RIVIERA BEACH BID NO SERVER VIRTULIZATION/SAN PROJECT

Datto Whitepaper: Business Continuity Built from the Ground Up. Business Continuity Built from the Ground Up THE ONLY CONTINUITY

Restricted Document. Pulsant Technical Specification

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Configuration Maximums

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Amazon Cloud Storage Options

Solution Brief: Creating Avid Project Archives

Protect SQL Server 2012 AlwaysOn Availability Group with Hitachi Application Protector

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

White Paper. Advanced Server Network Virtualization (NV) Acceleration for VXLAN

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

ADDENDUM 1 September 22, 2015 Request for Proposals: Data Center Implementation

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Scala Storage Scale-Out Clustered Storage White Paper

SPEED your path to virtualization.

SURFsara Data Services

Online Storage Replacement Strategy/Solution

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

Globus and the Centralized Research Data Infrastructure at CU Boulder

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Distributed Backup with the NetVault Plug-in for VMware for Scale and Performance

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

How To Build A Cloud Server For A Large Company

Balancing CPU, Storage

Hadoop Architecture. Part 1

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Microsoft Exchange, Lync, and SharePoint Server 2010 on Dell Active System 800v

EMC XTREMIO AND MICROSOFT EXCHANGE DATABASES

Kriterien für ein PetaFlop System

Pivot3 Reference Architecture for VMware View Version 1.03

Data Protection with NETGEAR Storage. Smart Solutions for Disk-to-Disk Backup and Disaster Recovery for SMB s and Multi-Office Environments

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Solution Overview. Business Continuity with ReadyNAS

10GbE Ethernet for Transfer Speeds of Over 1150MB/s

Tandberg Data AccuVault RDX

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

Hadoop on the Gordon Data Intensive Cluster

VMware vsphere Design. 2nd Edition

NAS or iscsi? White Paper Selecting a storage system. Copyright 2007 Fusionstor. No.1

A Deduplication File System & Course Review

SAP HANA - an inflection point

Scalable. Reliable. Flexible. High Performance Architecture. Fault Tolerant System Design. Expansion Options for Unique Business Needs

Transcription:

CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk

How we store our data NAS Technology Backup JASMIN/CEMS

CEDA Storage Data stored as files on disk. Data is migrated from media to media (early data will have moved six time) Data is audited to make sure there is no silent corruption. Storage Classes Primary : main archive copy of data - Crown jewels Secondary : 2 nd copy of data - CEDA not primary copy Facilitative - CEDA merely helps redistribute data

Storage structure Logical path : an interface for users & services (this is the reference path for users) /badc/n/x /badc/n/y /neodc/m/z Filesets e.g. X exist within datasets e.g. N Break up datasets into manageable chunks for backup etc. Physical path : unseen by users, the real path to the data /archive/x /archive/y /archive/z Connected to logical path via symlinks in filesystem

Storage : Now foo.badc.rl.ac.uk (NAS server) /disks/foo1 (30 Tb) /disks/foo2 (30 Tb) bar.badc.rl.ac.uk (NAS server) /disks/bar1 (20 Tb) /disks/bar2 (20 Tb) baz.badc.rl.ac.uk (NAS server) /disks/baz1 (10 Tb) /disks/baz2 (10 Tb) System of symlnks and mountpoints builds virtual filesystem. Systems hosting data access services mount all storage filesystems in order to see entire structure. Scaled to < 1 Pb, but difficult to manage so many individual storage servers & mounted filesystems.

Centre for Environmental Data Archival CEDA Data Project Type Current volume (Tb) NEODC Earth Observation 300 BADC Atmospheric Science 350 CMIP5 Climate Model 350 Total 1000 Tb = 1 Pb

Backup StorageD Tape-based backup storage solution provided by STFC e- science centre Filesets marked for backup Rsynced to StorageD cache Written to tape Secondary tape copy made & kept off site Secondary online storage Some datasets mirrored using rsync to secondary online storage (for rapid recovery)

Storage : JASMIN/CEMS Storage blades arranged into bladesets (with 1+ director blade) Director blades respond to data request (share out load among cluster) Parallel access, high bandwidth Single namespace can appear as one huge filesystem Reality : break up into logical chunks, expandable into free space Vastly reduced number of filesystems to mount!

e-infrastructure e-infrastructure Investment JASMIN CEMS

JASMIN/CEMS Data Project JASMIN CEMS NEODC Current 300 BADC Current 350 CMIP5 Current 350 CEDA Expansion 200 200 CMIP5 Expansion 800 300 CORDEX 300 MONSooN Shared Data Other HPC Shared Data 400 600 User Scratch 500 300 Totals 3500 Tb 1100 Tb

JASMIN functions CEDA data storage & services Curated data archive Archive management services Archive access services (HTTP, FTP, Helpdesk,...) Data intensive scientific computing Global / regional datasets & models High spatial, temporal resolution Private cloud Flexible access to high-volume & complex data for climate & earth observation communities Online workspaces Services for sharing & collaboration

Use cases Processing large volume EO datasets to produce: Essential Climate Variables Long term global climate-quality datasets EO data validation & intercomparisons Evaluation of models relying on the required datasets (EO datasets & in situ ) and simulations) being in the same place

Use cases User access to 5 th Coupled Model Intercomparison Project (CMIP5) Large volumes of data from best climate models Greater throughput required Large model analysis facility Workspaces for scientific users. Climate modellers need 100s of Tb of disk space, with high-speed connectivity UPSCALE project 250 Tb in 1 year PRACE supercomputing facility in Germany (HERMIT) Being shipped to RAL at present To be analysed by Met Office as soon as available Deployment of VMs running custom scientific software, co-located with data Outputs migrated to long term archive (BADC)

JASMIN-North University of Leeds 150 Tb JASMIN locations JASMIN-West University of Bristol 150 Tb JASMIN-Core STFC RAL 3.5 Pb + compute JASMIN-South University of Reading 500 Tb + compute

JASMIN kit

JASMIN kit JASMIN/CEMS Facts and figures JASMIN: 3.5 Petabytes Panasas Storage 12 x Dell R610 (12 core, 3.0GHz, 96G RAM)Servers 1 x Dell R815 (48 core, 2.2GHz, 128G RAM)Servers 1 x Dell Equalogic R6510E (48 TB iscsi VMware VM image store) VMWare vsphere Center 8 x Dell R610 (12 core, 3.5GHz, 48G RAM) Servers 1 x Force10 S4810P 10GbE Storage Aggregation Switch 4 x Gnodal GS4008 10/40Gbe switched stack

JASMIN kit JASMIN/CEMS Facts and figures CEMS: 1.1 Petabytes Panasas Storage 7 x Dell R610 (12 core 96G RAM) Servers 1 x Dell Equalogic R6510E (48 TB iscsi VMware VM image store) VMWare vsphere Center + vcloud Director

JASMIN kit JASMIN/CEMS Facts and figures Complete 4.5 PB (usable - 6.6PB raw) Panasas storage managed as one store, consisting of: 103 4U Shelves of 11 Storage Blades 1,133 (-29) Storage Blades with 2x 3TB drives each 2,266 3.5" Disc Drives (3TB Each) 103 * 11 * 1-29 = 1,104 CPUs (Celeron 1.33GHz CPU w. 4GB RAM) 29 Director Blades with Dual Core Xeon 1.73GHz w.8gb RAM) 15 kw Power in / heat out per rack = 180 kw (10-20 houses worth) 600kg per rack = 7.2 Tonnes 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute 4.6PB Useable == 920,000 DVD's = a 1.47 km high tower of DVDs 4.6PB Useable == 7,077,000 CDs = a 11.3 km high tower of CDs

JASMIN links

http://www.ceda.ac.uk http://www.stfc.ac.uk/e-science/38663.aspx Thank you!