Research Data Storage Infrastructure (RDSI) Project. DaSh Straw-Man
|
|
|
- Imogen Wright
- 10 years ago
- Views:
Transcription
1 Research Data Storage Infrastructure (RDSI) Project DaSh Straw-Man
2 Recap from the Node Workshop (Cherry-picked) *Higher Tiered DCs cost roughly twice the cost of Lower Tiered DCs. * However can provide a robust Higher Tiered like service. * Using co-operating Lower Tiered DCs. * With distributed and/or replicated mechanisms. * If a service (partially) fails another DC can temporarily provide it. * If a DC fails other DCs can provide its services temporarily. *Loss of service pardonable. Loss of data unforgivable. *Need to provide concrete assurances to the end user.
3 *Whats DaSh all about? * Developing sufficient elements of potential technical architectures for data interoperability and sharing. * So that its use can be appropriately specified the call for nodes proposal. * Mile high view of technical architectures to get data into and out of the RDSI node(s). *Ensure (meta)data durability and curation. * Loss of (meta)data is a capital offence. *Ensure data scalability. * Storage capacity, moving data into and out of a node(s). *Ensure End-user usability. * Provide a good end-user experience. *DaSh straw-man seeks community opinion on the various possible architectures.
4 Re-exported FS Building Blocks HSM, Tiers Storage Classes protocol neg. SRM SRM Wide Area xfers REST S3 Clouds GRIDs Wide Area xfers gsiftp, https dcap, DPM, xrootd NFS, CIFS WebDAV, FUSE
5 *irods and Federation *Federation is a feature in which separate irods Zones (irods instances), can be integrated. * When zones 'A' and 'B' are federated, they work together. * Each zone continues to be separately administrated. * Users in the multiple zones, if given permission, will be able to access data and metadata in the other zones. * No user passwords exchanged * Zone admins setup trust relationships to other zones.
6 ARCS Data Fabric icat only. Hosted on NeCTAR NSP irods server + tape irods server + tape irods server irods server irods server + tape irods server irods server + tape
7 Node s Eye View. (N=6) No Federation.
8 Node s Eye View. (N=6) Too much Federation. Too much confusion!!
9 Node s Eye View. (N=6) Just right Federation. Slave ICAT Slave ICAT Slave ICAT Master ICAT Slave ICAT Slave ICAT Slave ICAT
10 Distributed Fault- Tolerant Parallel FS Over N=6 nodes Re-exported FS Distributed vs Federated HSM, Tiers Storage Classes protocol neg. SRM SRM Wide Area xfers REST S3 Clouds GRIDs Wide Area xfers gsiftp, https dcap, DPM, xrootd NFS, CIFS WebDAV, FUSE
11 Distributed Pros and Cons *Distributed over a larger number of nodes. * Geographic scaling as well as node scaling. * Inherent data replication. *Fault Tolerant. * A storage brick took lickin but the service keep on tickin. * A node took a lickin but the service keep on tickin. *Parallel I/O. * All nodes can participate to move data. High aggregate BW. *Single global namespace. * Rather than separate logical namespaces. *Cost Effective * Use cheap hardware. Big disks over fast disks. * Design to expect failures.
12 File Replication *Whole file * Duplicated and stored on multiple bricks. *Slices of file * File sliced and diced, slices stored on multiple bricks. * A single brick may not contain the whole file. * Erasure Codes * Parity Blocks * (used in RAID) * Reed-Solomon * Over sampled polynomial constructed from data. * Add Erasure codes and slice file * Need M of N pieces to recover file (M < N) * Can store a slice on multiple bricks. Extra redundancy.
13 SurfNET Survey of Wide Area Distributed Storage. (Circa 2010) [1/4] Requirements required: *Scalable. * Capacity, performance and concurrent access. * Expandable storage without degrading performance. *High Availability. * Keeps data available to apps and clients. * Even in the event of a malfunction. * Or system reconfiguration. * Needs to replicates data to multiple locations.
14 SurfNET Survey of Wide Area Distributed Storage. (Circa 2010) [2/4] *Durability * No data is lost from a single software or hardware failure. * Automatically maintain minimum number of replicas. * Support backup to tape. *Performance at Traditional SAN/NAS Level. * Comparable performance to traditional non-distributed SAN/NAS. *Dynamic Operation. * Availability, durability, performance configurable per application. * Reduce costs as not running at highest support level at the time. * Allow users, apps, sysadmins to balance cost vs features. * System should be self-configurable, self-tunable. * Support data movement between different storage technologies. * Tiered functionality. Classes of Storage.
15 SurfNET Survey of Wide Area Distributed Storage. (Circa 2010) [3/4] *Cost Effective * Must be possible to build, configure, run and maintain in a cost effective manner. * Must work with commodity hardware. * Hardware may not be as reliable as high end hardware. * Configuration of system and its maintenance must be easy and straight forward. * Operation of system is energy efficient. * License fees for software when applicable must be limited. *Generic Interfaces. * System offers generic interfaces to apps and clients * POSIX interface. POSIX/NFSv4.1 semantics. * Block device (iscsi, etc).
16 SurfNET Survey of Wide Area Distributed Storage. (Circa 2010) [4/4] *Protocols Based on Open Standards * System build using open protocols * Reduces vendor lock-in * More economical in the long run. *Multi-Party Access * System must support access by multiple geographically dispersed parties at the same time. * Promotes collaboration between these parties.
17 SurfNET Survey of Wide Area Distributed Storage.(Circa 2010) Candidates *Lustre *GlusterFS *GPFS *Ceph *+ dcache Non-Candidates *XtreemFS *MogileFS *NFS v4.1 (pnfs) *ZFS *VERITAS FS *Parascale *CAStor *Tahoe-LAFS *DRBD
18 Nordic DataGrid Facility (dcache)
19 The DEISA Global File System at European Scale (Multi-Cluster General Parallel File System)
20 TeraGrid (GPFS & Lustre)
21 SurfNET Survey of Wide Area Distributed Storage + dcache Lustre GlusterFS GPFS Ceph dcache Owner Oracle Gluster IBM Newdream dcache.org Licence GNU GPL GNU GPL commercial GNU GPL DESY Data Primitive Object (file) Object (file) block Object (file) Object (file) Data placement Metadata Storage tiers Round robin + free space heuristics Max 2 metadata servers Pools of object targets Different strategies via modules Stored with file Policy based Distribute over storage servers Placement groups, random mappings Multiple metadata servers Policy based pnfs (postgresql) unknown Policy defined CRUSH rules Policy defined
22 SurfNET Survey of Wide Area Distributed Storage + dcache. Lustre GlusterFS GPFS Ceph dcache Failure handling Replication WAN deployment example Client interface Node types Assuming reliable nodes Server side (failover pairs) TeraGrid Native client, FUSE, CIFS, NFS Clients, metadata, objects Assuming unreliable nodes Assuming reliable nodes, Failure groups Assuming unreliable nodes Assuming reliable nodes Client side Server side Server side Server side City Cloud (Swedish IaaS provider) Native Client, FUSE TeraGrid DEISA Native Client, exports NFSv3, CIFS, pcifs, WebDAV, SRM (StoRM) unknown Native client, FUSE Client, data Client, data Clients, metadata, objects Fermilab, Swegrid, NDGF NFSv4.1, HTTP, WebDAV, GridFTP, Xrootd, SRM, dcap Clients, metadata, objects
23 WAN Data Caching and Performance Bringing data closer to where it is consumed. *Researchers are naturally distributed over the city and country *Some may not benefit from the high speed networks provide by AARNet and the NRN due to their location. *Can RDSI help these spatially disenfranchised? *Yes, (sort of). *Take the model of Content Delivery Networks. * ie Akamai, Amazon CloudFront, etc * Web content, videos etc are cached close to the end user. *But focus on data caching rather that content caching. *May not provide the same experience as the spatially franchised. * But every bit helps!
24 WAN Data Caching with GPFS.
25 WAN Data Caching Continued *dcache is a distributed cache system. * Locate a dcache pool close to the spatially disenfranchised. * dcache admin can populate required data collections to spatially disenfranchised using standard SRM processes. * Potentially a (reasonably) fast parallel transfer. *BioTorrents < * Allows scientists to rapidly share their results, datasets, and software using the popular BitTorrent file sharing technology. * All data is open-access and any illegal filesharing is not allowed on BioTorrents. * Or RDSI nodes can provide bit-torrent seeders itself from its nodes. * Ignoring the bad press BitTorrent is very good at what it does.
26 Data Durability. Things that go bump in the night (or not!) *Data Durability is an absolute necessity. *RDSI must provide a safe and enduring home for research data. * This might be more difficult as it appears! *The enemy is *Physics. *The world is a complex quantum/probabilistic system. * And so are all your computing and storage infrastructure. *Random events in your infrastructure will create: *Bit Rot and Silent Corruptions. *But you can engineer around the laws of physics.
27 Data Durability. Sources of Bit Rot and Silent Corruptions User Space ECC errors Corrupted Metadata Corrupted Data Inter-op issues Bugs in FW Wear Out Flipped Bits Latent sector errors VM Memory Filesystems Block layer SCSI layer Low-level drivers Controller firmware Storage firmware Disk Mechanics + Physical magnetic media All interconnecting cables Cosmic rays/sun spots EM Radiation, etc Lost Writes Torn Writes Misdirected Writes From Silent Corruptions, Peter.Kelemen, CERN
28 Data Durability. Expected Background Bit Error Rate (BER) * NIC/Link/HBA: (1 bit in ~1.1 GB) * Check-summed, retransmit if necessary * Memory: (1 bit in ~116 GB) * ECC * SATA Disk: (1 bit in ~11.3 TB) * Various error correction codes * Enterprise Disk: (1 bit in ~113 TB) * Various error correction codes * Tape: (1 bit in ~1.11 PB) * Various error correction codes * Data maybe encoded up to five or more times as it travels to and from physical disk/tape to user space. * At petascale incredibly infrequent events happen all the time. From Silent Corruptions, Peter.Kelemen, CERN
29 Data Durability. The errors you know. The errors you don t know. There are known errors; there are errors we know we know. We also know there are known unknown errors; that is to say we know there are some things we do not know. But there are also unknown unknown errors; the ones we don't know we don't know. Paraphrased from Donald Rumsfeld From Silent Corruptions, Peter.Kelemen, CERN
30 Data Durability. The errors you know. The errors you don t know. *There are Data Errors that you will now know about. * Logs message. * SMART messages * Detection: SW/HW-level with error messages * Correction: SW/HW-level with warnings * If your really lucky your kernel will panic so you ll know something happened. *There are Data Errors that you will never know about. * As far as your storage infrastructure knows that write/read was executed perfectly. * In reality you will probably never know the data has been corrupted. * (Unless you design for this eventuality.) From Silent Corruptions, Peter.Kelemen, CERN
31 Data Durability. How to discover the unknown unknowns. * checksums * (CRC32, MD5, SHA1,...) * Checksum (meta)data. * Transport checksum with meta(data) for later comparison. * Error detection and correction codings. * Detects errors caused by noise, etc. (See checksums.) * Corrects detected errors and reconstruction of the original, error-free data. * Backward error correction: * Automatic Retransmit on error detection. * Forward error correction: * Encode extra redundant data. * Regenerate data from Forward Error Codes. * Multiple copies with quorum. From Silent Corruptions, Peter.Kelemen, CERN
32 Data Durability. Silent Corruptions and CERN * Circa PB tape. 4PB disk nodes drives, 1200 RAID. * Probabilistic storage integrity check (fsprobe) on 4000 nodes. * Write known bit pattern * Read it back. * Compare and alert when mismatch found. * 6 cycles over 1 hour each. * Low I/O footprint for background operation on 2GB file. * Keep complexity to the minimum. * use static buffers * Attempt to preserve details about detected corruptions for further analysis. From Silent Corruptions, Peter.Kelemen, CERN
33 Data Durability. Silent Corruptions and CERN *2000 incidents reported over 97 PB of traffic. * 6/day on average observed! * 192 MB of data silent data corruption. *320 nodes affected over 27 hardware types. *Multiple types of corruptions. *Some corruptions are transient. *Overall BER considering all the links in the chain * 3x10-7. *Not the spec d rates. From Silent Corruptions, Peter.Kelemen, CERN
34 Data Durability. Types of silent Corruptions * Type 1 * Single/double bit flip errors. Usually persistent. * Usually bad memory (RAM, cache, etc.) * Happens with expensive ECC memory too. * Type II * Small, 2 n -sized random chunks ( bytes) of unknown origin. * Usually transient. * Possible OOM Killer or corrupted SLAB/SLUB allocator. * Type III * multiple large chunks of 64K, old file data. I/O command timeouts * Usually persistent. * Type IV various sized chunks of zeros. From Silent Corruptions, Peter.Kelemen, CERN
35 Data Durability. What Can Be Done? *Self-examining/healing hardware. *WRITE-READ cycles before ACK. *Check-summing though not necessarily enough. *End-to-end check-summing. *Store multiple copies. *Regular scrubbing of RAID arrays. *Data refresh. Re-read cycles on tapes. *Generally accept and prepare for corruptions. From Silent Corruptions, Peter.Kelemen, CERN
36 Data Durability. The solutions. ZFS. The Good. * Developed by Sun (now Oracle) on Solaris. * Designed from the ground up with a focus on data integrity. * Combined filesystem, logical volume manager * RAID-Z, RAID-Z2, RAID-Z3, or mirrored * Copy-on-write. Transactional operation. * Built-in end-to-end data integrity. * Data/metadata checksum all the way to the root. * Always consistent on disk. no fsck or journaling * Automatic self-healing. * Intelligent online scrubbing and resilvering. * Very large filesystem limits. Max. 256 ZB FS * Deduplication. * Snapshots. and much much more.
37 Data Durability. The solutions. ZFS. The Bad. *Supported on Solaris only. * OpenSolaris is no more. *Kernel ports for FreeBSD and NetBSD. * Using OpenSolaris kernel source code. *Linux port via ZFS-FUSE. * Kernel space good. User space not so good. *ZFS on Linux. * Supported by Lawrence Livermore National Laboratory. * Issues with CDDL and GPL license compatibility in the kernel. * Solaris Portability layer/shim to the rescue. * Currently v0.6.0-rc4. It worked for me but not production grade yet.
38 Data Durability. The solutions. ZFS for Lustre. *1999: Peter Bramm from CMU creates Lustre. * A GPL massively parallel distributed file system. * 2003: Bramm created Cluster File Systems Inc to continue work. * 2007: Sun acquires Cluster File Systems Inc. * Works to combines ZFS and Lustre. * High Performance parallel FS with end to end data integrity. * But only supported on solaris. *2009: LLNL starts porting ZFS kernel to linux. * Oracle acquires Sun. *2010: Oracle announced ZFS/Lustre only for Solaris. *2011: LLNL starts ZFS/Lustre port for linux. *Late 2011: LLNL plans ZFS/Lustre FS. * 50 PB. 512GB/s 1TB/s bandwidth.
39 Data Durability. The solutions. DataDirect Networks S2S Technology. *SATA storage with: * Enterprise-class performance. * Reliability and data integrity. * Automatic self-healing * Detects anomalies and begins journaling all writes while recovering operations. *Dynamic Maid (D-MAID) * Save additional power and cooling by powering down the platters, * Where over 80% of power is consumed. * DC friendly.
40 Community Input Time. *Are we barking up the right tree. *Are we barking up the wrong tree. *Is there even a tree in the first place. *You decide.
41 Building Block *Are the base building blocks sufficient? * If not what should be added? *Is there a need for additional data transfer protocols. * If so what should be added? *Is there a need for additional file system protocol? * If so what should be added? *What additional public cloud storage infrastructure should RDSI consider? *What additional private cloud storage infrastructure should RDSI consider?
42 Federated vs Distributed. *Should RDSI continue to embrace the federated irods model? *Should RDSI embrace the Distributed FS model? *Should RDSI embrace both the federated and distributed model?
43 Distributed Fault Tolerant Parallel Filesystems. *If RDSI chooses to use a Distributed Fault Tolerant Parallel filesystem component, are there such systems that we have not yet consider?
44 WAN Data Caching There are always going to researchers who may not be able to benefit from the high speed networks provide by AARNet and the NRN. WAN Data Caching may partially eliminate their disadvantage but at cost. *Should RDSI consider the use of WAN Data Caches? *If so what sites would benefit from these data caches?
45 Data Durability. Data Durability is one of the foremost challenges of RDSI. However it seems impossible to entirely eliminate the various issues of bit rot and silent corruptions. *Given this fact of nature what level of data durability is the research community willing to accept?
Survey of Technologies for Wide Area Distributed Storage
Survey of Technologies for Wide Area Distributed Storage Project : GigaPort3 Project Year : 2010 Project Manager : Rogier Spoor Author(s) Completion Date : 2010-06-29 Version : 1.0 : Arjan Peddemors, Christiaan
Introduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage
Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes
Moving Virtual Storage to the Cloud
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
Silent data corruption in SATA arrays: A solution
Silent data corruption in SATA arrays: A solution Josh Eddy August 2008 Abstract Recent large academic studies have identified the surprising frequency of silent read failures that are not identified or
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
Building Storage Service in a Private Cloud
Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
ZFS Administration 1
ZFS Administration 1 With a rapid paradigm-shift towards digital content and large datasets, managing large amounts of data can be a challenging task. Before implementing a storage solution, there are
Storage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems
Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The
PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute
PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption
Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari
Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari 1 Agenda Introduction on the objective of the test activities
CSE-E5430 Scalable Cloud Computing P Lecture 5
CSE-E5430 Scalable Cloud Computing P Lecture 5 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 12.10-2015 1/34 Fault Tolerance Strategies for Storage
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
Lessons learned from parallel file system operation
Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association
High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility
High Performance Computing Specialists ZFS Storage as a Solution for Big Data and Flexibility Introducing VA Technologies UK Based System Integrator Specialising in High Performance ZFS Storage Partner
StorPool Distributed Storage Software Technical Overview
StorPool Distributed Storage Software Technical Overview StorPool 2015 Page 1 of 8 StorPool Overview StorPool is distributed storage software. It pools the attached storage (hard disks or SSDs) of standard
Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.
Getting performance & scalability on standard platforms, the Object vs Block storage debate 1 December Webinar Session Getting performance & scalability on standard platforms, the Object vs Block storage
Managed Storage @ GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team
Managed Storage @ GRID or why NFSv4.1 is not enough Tigran Mkrtchyan for dcache Team What the hell do physicists do? Physicist are hackers they just want to know how things works. In moder physics given
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012
Cloud Archive Trends and Challenges PASIG Winter 2012 Raymond A. Clarke Enterprise Storage Consultant, Oracle Enterprise Solutions Group How Is PASIG Pronounced? Is it PASIG? Is it
ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. http://milek.blogspot.com. Robert Milkowski.
ZFS Backup Platform Senior Systems Analyst TalkTalk Group http://milek.blogspot.com The Problem Needed to add 100's new clients to backup But already run out of client licenses No spare capacity left (tapes,
GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
Data Storage in Clouds
Data Storage in Clouds Jan Stender Zuse Institute Berlin contrail is co-funded by the EC 7th Framework Programme 1 Overview Introduction Motivation Challenges Requirements Cloud Storage Systems XtreemFS
WHITE PAPER. Software Defined Storage Hydrates the Cloud
WHITE PAPER Software Defined Storage Hydrates the Cloud Table of Contents Overview... 2 NexentaStor (Block & File Storage)... 4 Software Defined Data Centers (SDDC)... 5 OpenStack... 5 CloudStack... 6
June 2009. Blade.org 2009 ALL RIGHTS RESERVED
Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS
Next Generation Tier 1 Storage
Next Generation Tier 1 Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies, Matthew Viljoen HEPiX Beijing 16th October 2012 Why are we doing this?
Data storage services at CC-IN2P3
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief Agenda Hardware: Storage on disk. Storage on tape. Software:
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction
dcache Introduction Forschungszentrum Karlsruhe GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany Dr. http://www.gridka.de What is dcache? Developed at DESY and FNAL Disk
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
The dcache Storage Element
16. Juni 2008 Hamburg The dcache Storage Element and it's role in the LHC era for the dcache team Topics for today Storage elements (SEs) in the grid Introduction to the dcache SE Usage of dcache in LCG
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL
Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive
Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska
Oracle Maximum Availability Architecture with Exadata Database Machine Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska MAA is Oracle s Availability Blueprint Oracle s MAA is a best practices
Distributed File System Choices: Red Hat Storage, GFS2 & pnfs
Distributed File System Choices: Red Hat Storage, GFS2 & pnfs Ric Wheeler Architect & Senior Manager, Red Hat June 27, 2012 Overview Distributed file system basics Red Hat distributed file systems Performance
IBM System x GPFS Storage Server
IBM System x GPFS Storage Server Schöne Aussicht en für HPC Speicher ZKI-Arbeitskreis Paderborn, 15.03.2013 Karsten Kutzer Client Technical Architect Technical Computing IBM Systems & Technology Group
Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop
Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop Arguments to be discussed Scaling storage for clouds Is RAID dead? Erasure coding as RAID replacement
Ceph. A file system a little bit different. Udo Seidel
Ceph A file system a little bit different Udo Seidel Ceph what? So-called parallel distributed cluster file system Started as part of PhD studies at UCSC Public announcement in 2006 at 7 th OSDI File system
Sep 23, 2014. OSBCONF 2014 Cloud backup with Bareos
Sep 23, 2014 OSBCONF 2014 Cloud backup with Bareos OSBCONF 23/09/2014 Content: Who am I Quick overview of Cloud solutions Bareos and Backup/Restore using Cloud Storage Bareos and Backup/Restore of Cloud
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
XtreemFS Extreme cloud file system?! Udo Seidel
XtreemFS Extreme cloud file system?! Udo Seidel Agenda Background/motivation High level overview High Availability Security Summary Distributed file systems Part of shared file systems family Around for
The Design and Implementation of the Zetta Storage Service. October 27, 2009
The Design and Implementation of the Zetta Storage Service October 27, 2009 Zetta s Mission Simplify Enterprise Storage Zetta delivers enterprise-grade storage as a service for IT professionals needing
Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week
Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly
Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices
Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems
Fault Tolerance & Reliability CDA 5140 Chapter 3 RAID & Sample Commercial FT Systems - basic concept in these, as with codes, is redundancy to allow system to continue operation even if some components
Linux Powered Storage:
Linux Powered Storage: Building a Storage Server with Linux Architect & Senior Manager [email protected] June 6, 2012 1 Linux Based Systems are Everywhere Used as the base for commercial appliances Enterprise
File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
Scalable filesystems boosting Linux storage solutions
Scalable filesystems boosting Linux storage solutions Daniel Kobras science + computing ag IT-Dienstleistungen und Software für anspruchsvolle Rechnernetze Tübingen München Berlin Düsseldorf Motivation
Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015 Index - Storage use cases - Bluearc - Lustre - EOS - dcache disk only - dcache+enstore Data distribution by solution
Home storage and backup options. Chris Moates Head of Lettuce
Home storage and backup options Chris Moates Head of Lettuce Who I Am Lead Systems Architect/Administrator for Gaggle Previously employed as Staff Engineer by EarthLink/MindSpring Linux hobbyist since
The Panasas Parallel Storage Cluster. Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
The Panasas Parallel Storage Cluster What Is It? What Is The Panasas ActiveScale Storage Cluster A complete hardware and software storage solution Implements An Asynchronous, Parallel, Object-based, POSIX
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
Big Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
Long term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Netapp @ 10th TF-Storage Meeting
Netapp @ 10th TF-Storage Meeting Wojciech Janusz, Netapp Poland Bogusz Błaszkiewicz, Netapp Poland Ljubljana, 2012.02.20 Agenda Data Ontap Cluster-Mode pnfs E-Series NetApp Confidential - Internal Use
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms Abhijith Shenoy Engineer, Hedvig Inc. @hedviginc The need for new architectures Business innovation Time-to-market
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems
General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems Veera Deenadhayalan IBM Almaden Research Center 2011 IBM Corporation Hard Disk Rates Are Lagging There have been recent
Large Scale Storage. Orlando Richards, Information Services [email protected]. LCFG Users Day, University of Edinburgh 18 th January 2013
Large Scale Storage Orlando Richards, Information Services [email protected] LCFG Users Day, University of Edinburgh 18 th January 2013 Overview My history of storage services What is (and is not)
Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Maxta Storage Platform Enterprise Storage Re-defined
Maxta Storage Platform Enterprise Storage Re-defined WHITE PAPER Software-Defined Data Center The Software-Defined Data Center (SDDC) is a unified data center platform that delivers converged computing,
High Availability with Windows Server 2012 Release Candidate
High Availability with Windows Server 2012 Release Candidate Windows Server 2012 Release Candidate (RC) delivers innovative new capabilities that enable you to build dynamic storage and availability solutions
WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives
WHITE PAPER QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives CONTENTS Executive Summary....................................................................3 The Limits of Traditional
Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation
Overview of I/O Performance and RAID in an RDBMS Environment By: Edward Whalen Performance Tuning Corporation Abstract This paper covers the fundamentals of I/O topics and an overview of RAID levels commonly
EMC DATA DOMAIN OPERATING SYSTEM
EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive
Trends in Enterprise Backup Deduplication
Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization
ovirt and Gluster hyper-converged! HA solution for maximum resource utilization 31 st of Jan 2016 Martin Sivák Senior Software Engineer Red Hat Czech FOSDEM, Jan 2016 1 Agenda (Storage) architecture of
EMC DATA DOMAIN OPERATING SYSTEM
ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read
High Availability Databases based on Oracle 10g RAC on Linux
High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database
ZFS In Business. Roch Bourbonnais Sun Microsystems [email protected]
ZFS In Business Roch Bourbonnais Sun Microsystems [email protected] 1 What is ZFS Integrated Volume and Filesystem w no predefined limits Volume Management > pooling of disks, luns... in raid-z
EMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS
Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges
Alternatives to Big Backup
Alternatives to Big Backup Life Cycle Management, Object- Based Storage, and Self- Protecting Storage Systems Presented by: Chris Robertson Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge
Globus and the Centralized Research Data Infrastructure at CU Boulder
Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, [email protected] Conan Moore, [email protected] Thomas Hauser, [email protected] Peter Ruprecht,
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
Implementing Enterprise Disk Arrays Using Open Source Software. Marc Smith Mott Community College - Flint, MI Merit Member Conference 2012
Implementing Enterprise Disk Arrays Using Open Source Software Marc Smith Mott Community College - Flint, MI Merit Member Conference 2012 Mott Community College (MCC) Mott Community College is a mid-sized
Cloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 The information contained in this presentation relates to a pre-release product which may be substantially modified before it is commercially released. This pre-release
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management RAID Structure
High Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features
Solaris For The Modern Data Center Taking Advantage of Solaris 11 Features JANUARY 2013 Contents Introduction... 2 Patching and Maintenance... 2 IPS Packages... 2 Boot Environments... 2 Fast Reboot...
Reliability and Fault Tolerance in Storage
Reliability and Fault Tolerance in Storage Dalit Naor/ Dima Sotnikov IBM Haifa Research Storage Systems 1 Advanced Topics on Storage Systems - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
Analisi di un servizio SRM: StoRM
27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The
