CSC Storage Projects. CSC = CSC - IT Center for Science NSC'08 Ari Lukkarinen

Similar documents
High Availability Databases based on Oracle 10g RAC on Linux

Tier0 plans and security and backup policy proposals

Staying afloat in today s Storage Pools. Bob Trotter IGeLU 2009 Conference September 7-9, 2009 Helsinki, Finland

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

Towards Greener DataCenters

Managing managed storage

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

REQUEST FOR PROPOSAL #347 SERVER VIRTUALIZATION QUESTIONS & ANSWERS

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Reference Architectures for Repositories and Preservation Archiving

Sun Constellation System: The Open Petascale Computing Architecture

Enabling Technologies for Distributed and Cloud Computing

PART 1: Breaking the Connections

The Modern Virtualized Data Center

HIP Computing Resources for LHC-startup

Enabling Technologies for Distributed Computing

New Storage System Solutions

Preview of a Novel Architecture for Large Scale Storage

Mass Storage System for Disk and Tape resources at the Tier1.

System Requirements for Netmail Archive

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

7 Real Benefits of a Virtual Infrastructure

Disaster Recovery Checklist Disaster Recovery Plan for <System One>

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

WASHTENAW COUNTY FINANCE DEPARTMENT Purchasing Division

Backup for branch offices and compartment backups. Måns Höiom & Rikard Lindkvist

HPSA Agent Characterization

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Appendix 3. Specifications for e-portal

Best in Class Storage Solutions for BS2000/OSD

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

How To Build A Cloud Storage System

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

High Performance Computing OpenStack Options. September 22, 2015

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

APPENDIX C PRICING INDEX DIR-TSO-2540

EMC Unified Storage for Microsoft SQL Server 2008

Long term retention and archiving the challenges and the solution

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

High Performance Oracle RAC Clusters A study of SSD SAN storage A Datapipe White Paper

XenData Archive Series Software Technical Overview

<Insert Picture Here> Solution Direction for Long-Term Archive

QuickSpecs. Overview. GBS Backup Nederland b.v. v1.4 - October 2015 info@gbsbackup.com 1. GBS_R2.0 Hybrid Backup server

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Seradex White Paper. Focus on these points for optimizing the performance of a Seradex ERP SQL database:

EMC DATA DOMAIN OVERVIEW

RFP - Equipment for the Replication of Critical Systems at Bank of Mauritius Tower and at Disaster Recovery Site. 06 March 2014

Scalable NAS for Oracle: Gateway to the (NFS) future

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Flexible Scalable Hardware independent. Solutions for Long Term Archiving

HP Serviceguard for Linux Certification Matrix

Addendum No. 1 to Packet No Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department

WHITE PAPER BRENT WELCH NOVEMBER

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

NET ACCESS VOICE PRIVATE CLOUD

<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012

Clusters: Mainstream Technology for CAE

Implementing a Digital Video Archive Based on XenData Software

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

The future is in the management tools. Profoss 22/01/2008

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

HPC Update: Engagement Model

Designing and Deploying a Distributed Architecture for Research Data Management

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

RO-11-NIPNE, evolution, user support, site and software development. IFIN-HH, DFCTI, LHCb Romanian Team

ETERNUS Storage Systems. Fujitsu America Inc

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

SQL Server Business Intelligence on HP ProLiant DL785 Server

DELL. Dell Microsoft Windows Server 2008 Hyper-V TM Reference Architecture VIRTUALIZATION SOLUTIONS ENGINEERING

Lessons learned from parallel file system operation

Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Microsoft SQL Server 2005 on Windows Server 2003

W H I T E P A P E R. Running Microsoft Exchange Server in a Virtual Machine Using VMware ESX Server 2.5

Data storage services at CC-IN2P3

StarWind iscsi SAN Software Hands- On Review

Doc. Code. OceanStor VTL6900 Technical White Paper. Issue 1.1. Date Huawei Technologies Co., Ltd.

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Certification: HP ATA Servers & Storage

Tape. Emerging Storage Technology Panel IEEE/NASA Conference MSST2008 September 25, 2008

Vendor and Hardware Platform: Fujitsu BX924 S2 Virtualization Platform: VMware ESX 4.0 Update 2 (build )

Technologies of ETERNUS Virtual Disk Library

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

A Deduplication File System & Course Review

Extend the Benefits of VMware vsphere with NetApp Storage

ClearPath Storage Update Data Domain on ClearPath MCP

ICC MANAGED SERVICES can provide maintenance, support, supply and installation on:

Storage Infrastructure as a Service! Andres Rodriguez, CEO

Big Data and Storage Management at the Large Hadron Collider

Virtualizing Microsoft SQL Server 2008 on the Hitachi Adaptable Modular Storage 2000 Family Using Microsoft Hyper-V

A National Computing Grid: FGI

Building Clusters for Gromacs and other HPC applications

Transcription:

CSC Storage Projects CSC = CSC - IT Center for Science NSC'08 Ari Lukkarinen

CSC Company owned by ministry of education Staff: 150 people Services: Funet Computing Application Information management Data services for science and culture

Funet - Finnish University and research NETwork High quality academic Internet services for Finland Advanced services like IPv6 and IP multicast Core line speed 2,5 Gbps Funet member access at speeds up to 1 Gbps Lightpaths International connectivity through NORDUnet 85 member organizations: All Finnish universities (20) and polytechnics (30) Public research institutions and administrative organizations close to higher education (35) Over 350 000 end users

COMPUTING CRAY XT4/XT5 alias Louhi 2356 AMD Quad Opteron 2,3 GHz CPUs 9424 cores (4048 XT4 + 5376 XT5). Memory ~ 10,3TB Theoretical computing power 86,7 teraflop/s Lustre filesystem (70 TB) HP-CP4000BL ProLiant Supercluster alias Murska 554 dual-core AMD Opteron 2.6 GHz CPUs 2178 cores, Memory ~5 TB Theoretical computing power 11,3 Teraflops HP SFS filesystem (100 TB) Power consumption ~ 1 MW

DATA CSC Storage environment Projects RTVA (Radio and TV archive) HIP/CERN Long Term Storage

Operational environment Universities have a lot of cheap labour available for IT administration. Storage capacity is inexpensive. Research groups/laboratories can easily have systems being able to store tens of terabytes of data. Problem: Who will take care of the data? Organizations and research groups have lot of data, but they do not have the manpower, resources, or knowhow, how to make sure that their data will later be easily accessible and easily usable. They would like someone to do a long term storage solution for them.

CSC Storage environment Storage hardware, midrange Other midrange storage systems: EMC CX300 (10 TB) Fujitsu Siemens FibreCAT CX700 (50 TB) 2 x LSI/Engenio 6998 (70 TB total) 2x HDS AMS1000 (400 TB) + other smaller disk arrays + HP SFS (100 TB)

CSC Storage environment Storage hardware, enterprise Heavy duty databases Not for scientific customers Library systems HDS USP 100

Storage Projects: RTVA Radio and TV Archive, -> National audio visual archive will archive broadcasted TV and radio programs. CSC was chosen to be a technical partner taking care of building up the system. Specs: Number of TV channels, about 10-20. Number of radio channels, about 30. Data volume, about 30 TB/year. Limited number of users. Project started 21.1.2008. New law was needed.

RTVA: Data capture Problem: Nation wide channels are relatively easy to handle. Small local TV/radio operators are problematic, because they require local precense. Subcontractor will be used to capture the data. Another subcontractor will make the metadata software. Schedule: Data capture: Test phase starts 1.11.2008 Production phase will start 1.1.2009

Data storage Metadata Capture Local disk cache Lightpath Streaming server (Linux?) Front end server (Linux) Archive server SAM-FS Local disk cache Local disk cache Tape Local disk containing everything

Reliable data transfer? Storage challenges Dedicated fiber connection (+USB disks) Data transfer tools Rsync + other tools Long term storage issues (intergrity) SAM-FS tuning (verify after write, checksums) Additional site wide data integrity software under preparation Linux filesystem issues Solaris and ZFS initial candidate. XFS propably the best candidate (no RHEL) XFS can not be resized beyond 2 TB

Storage Projects: HIP / LHC HIP (Helsinki Institute of Physics) decided to store date from LHC experiment to CSC. Finland participates three CERN experiments: CMS, Totem, ALICE CSC is now a part of joint nordic Tier- 1 center and a Tier-2 center. Tier-1 centers store the data (disk and tape) Tier-2 centers use the data and compute new physics.

dcache Accelerator creates 40 PB data. 10 TB will be stored to nordic countries. Data managed with dcache software. Dcache virtualizes storage devices. Data has an URL. Physical location available from database. CSC provides 170 TB disk and 70 TB tape capacity. Capacity will be doubled 2009.

dcache Implementation 10 GE dcache Servers (HP Blades) SAN disk Archive server SAM-FS 170 TB ATA disk Local disk cache Tape

dcache SAN HW HDS AMS1000 dcache Servers (HP Blades) HDS USP VM ATA RAID-6 volumes for data HDS AMS1000

CSC Storage environment Storage hardware, virtualization HDS USP VM Why? Large LUNs Distributes load Active-active controllers (hides midrange complexity) Cache 12GB No internal disks 200 TB virtualized storage capacity Old arrays will be virtualized.

2 TB limits XFS Disk arrays dcache Challenges Max LUN count limitation dcache Unsufficient documentation Lack of recovery tools Not a product, but a prototype software. Virtualization tuning

Storage Project: Long Term Storage (LTS) Long Term Storage (10 - years). Problems: Disks and other HW are unreliable. Media change interval is short. People make mistakes. Need for a secure storage.

Secure storage challenges How to garantee the data integrity? Checksums Is the problem on a database or on the data? How tho make sure that the data exists? External database on data, cheksums, cheksums WORM media. Many copies of data Physical separation (more than one data center) Technological separation More than one archive software. Different HW components

LTS, Datacenter challenges Datacenters have to able to communicate with each other Metadata database of all data needed. Files have to have common properties that can be used to identify the data. There must be a way to transfer some or all data from one datacenter to another.

LTS and Finland Government decision: A national digital library (NDL) will created. Information from museums, national archive, and from libraries will be stored into a centralized system. The project includes 35 organizations and 70 persons.

NDL schema Common user interface Incoming material Library Museum Archive Common storage subsystem

Schedule 2008: Planning started 2009: Planning continues, piloting 2010: Storage subsystem building 2011: User interface building 2011: System operational

Challenges Standard metadata set has to be created. Different vocabulary btw. organizations. For example the definition of collection depends on organization. Data formats and migrations from a format to another. Lot of participants. Technological skills of organizations vary. Various needs. All requirments can propably not be met. Schedule

LTS and CSC CSC participates national digital library project. CSC could be one site taking care of an LTS archive. Metadata database program under development. Collect information on filesystems (+ from dcache) to a central database. Intergrity of the data Reporting, file history log Data migrations CSC environment is being evaluated by using TRAC. (Trustworthy Repositories Audit & Certification: Criteria and Checklist)

Storage Challenges Lack of Object Storage Devices!!!! Limits disk array functionality Data Life cycle management Disk array provides disk for tens of servers. Disk array replacement are extremely difficult. SAN networks Is there really need to certificate the whole chain from an application disk array? We need more clever SAN networks (or they will become obsolete).