Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG



Similar documents
SUSE Enterprise Storage Highly Scalable Software Defined Storage. Gábor Nyers Sales

Sep 23, OSBCONF 2014 Cloud backup with Bareos

Product Spotlight. A Look at the Future of Storage. Featuring SUSE Enterprise Storage. Where IT perceptions are reality

Cloud storage reloaded:

DreamObjects. Cloud Object Storage Powered by Ceph. Monday, November 5, 12

SUSE Enterprise Storage Highly Scalable Software Defined Storage. Māris Smilga

Hadoop. Sunday, November 25, 12

Building low cost disk storage with Ceph and OpenStack Swift

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

SUSE Linux uutuudet - kuulumiset SUSECon:sta

Enhancing UNICORE Storage Management using Hadoop

Enabling High performance Big Data platform with RDMA

Storage Virtualization in Cloud

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop implementation of MapReduce computational model. Ján Vaňo

Accelerating and Simplifying Apache

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Object-based Storage in Big Data and Analytics. Ashish Nadkarni Research Director Storage IDC

COURSE CONTENT Big Data and Hadoop Training

THE HADOOP DISTRIBUTED FILE SYSTEM

BIG DATA TRENDS AND TECHNOLOGIES

How To Improve Afs.Org For Free On A Pc Or Mac Or Ipad (For Free) For A Long Time (For A Long Term Time) For Free (For Cheap) For Your Computer Or Your Hard Drive) For The Long


Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Open source Google-style large scale data analysis with Hadoop

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Large scale processing using Hadoop. Ján Vaňo

Storage Architectures for Big Data in the Cloud

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Flexible Scalable Hardware independent. Solutions for Long Term Archiving

Implement Hadoop jobs to extract business value from large and varied data sets

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

東 海 大 學 資 訊 工 程 研 究 所 碩 士 論 文

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

XtreemFS Extreme cloud file system?! Udo Seidel

HDFS. Hadoop Distributed File System

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

How To Scale Out Of A Nosql Database

Ceph. A file system a little bit different. Udo Seidel

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Installing Hadoop over Ceph, Using High Performance Networking

Design and Evolution of the Apache Hadoop File System(HDFS)

From Wikipedia, the free encyclopedia

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Data Management using irods

GRAU DATA Scalable OpenSource Storage CEPH, LIO, OPENARCHIVE

Hadoop: Embracing future hardware

WOS. High Performance Object Storage

Application Development. A Paradigm Shift

Building Storage-as-a-Service Businesses

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop Ecosystem B Y R A H I M A.

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

I/O Considerations in Big Data Analytics

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Indexes for Distributed File/Storage Systems as a Large Scale Virtual Machine Disk Image Storage in a Wide Area Network

Big Data Management and Security

CSE-E5430 Scalable Cloud Computing Lecture 2

Cloud Computing Where ISR Data Will Go for Exploitation

Apache Hadoop. Alexandru Costan

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

Open Source for Cloud Infrastructure

VM Image Hosting Using the Fujitsu* Eternus CD10000 System with Ceph* Storage Software

High Performance Computing OpenStack Options. September 22, 2015

Big Data and Apache Hadoop s MapReduce

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Improving Scalability Of Storage System:Object Storage Using Open Stack Swift

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Virtualizing Apache Hadoop. June, 2012

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Diagram 1: Islands of storage across a digital broadcast workflow

A very short Intro to Hadoop

Case Study : 3 different hadoop cluster deployments

ATLAS Tier 3

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

Using Hadoop for Webscale Computing. Ajay Anand Yahoo! Usenix 2008

Hadoop and Map-Reduce. Swati Gore

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Constructing a Data Lake: Hadoop and Oracle Database United!

Apache Hadoop FileSystem and its Usage in Facebook

Scientific Computing Data Management Visions

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Storage solutions for a. infrastructure. Giacinto DONVITO INFN-Bari. Workshop on Cloud Services for File Synchronisation and Sharing

Scalable Services for Digital Preservation

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Ceph. A complete introduction.

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Technical. Overview. ~ a ~ irods version 4.x

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Like what you hear? Tweet it using: #Sec360

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Transcription:

Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG

Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s Vast selection for different use cases Complex taxonomy Distributed access Federated access This presentation covers more recent (free) solutions for typical, current use cases 2

FhGFS High Performance Computing Core Features and Use Cases Direct parallel access clients meta data server storage servers Core Features Distributed files and metadata Native support for HPC networks (Infiniband) Easy to setup and maintain POSIX support Now marketed as BeeGFS Use Cases Data storage for HPC clusters: Requires performance, but no high availability. On-demand provisioning of cross-server storage: Requires easy setup, but no high availability. 3

Hadoop FS Big Data Core Features and Use Cases data access Client Nam enode BackupNode state inform ation Core Features Same server for data and compute Replication prevents data loss Part of the Hadoop framework Extensive ecosystem of big data tools MapReduce, Pig (Computation) HBase (Database) Hive (Data Warehouse) Use Cases Really big data: 5000+ nodes, 100+ PB data per cluster at Yahoo, Facebook... Any application using the Hadoop ecosystem: Performance and scalability, no POSIX required. 4

Ceph Cloud and Data Center Storage Core Features and Use Cases Low-Level API Object-Based Block-Based File-Based LIBRADOS Library access to RADOS: Java C, C++ Python RADOSGW REST S3 Interface RBD Block devices KVM / QEMU CEPH FS POSIX Kernel FUSE-Client Core Features Utilizes compute power of storage nodes (OSDs) and clients Data distribution for performance Data replication for redundancy Easily scalable by adding OSDs Self healing, self managing reliable autonomic distributed object store (RADOS) Use Cases Cost-efficient, flexible and scalable high-availability storage Storage for cloud and virtualization infrastructures (OpenStack) 5

irods Federated Data Access Core Features and Use Cases Trier Karlsruhe Replication Göttingen Core Features Data Management Middleware Rule Engine for policy enforcement Data replication between sites and data centers Creation of federated repositories beyond organizational boundaries Transparent access to remote site data from any site in the federation Central catalogue of access rights Use Cases Replication of archival research data between data centers (disaster prevention) Implementation of data management policies and workflows Federated data infrastructure 6

Conclusion Choose a file system with a scope that overlaps well with your use case Advanced policy requirements in data federations exceed the scope of typical distributed file systems. Data management middlewares - like irods - are a possible choices for realizing distributed data scenarios Solutions for simpler site distribution scenarios exist (replication) Choosing the wrong file system can be very expensive, when you have to migrate Petabytes of data 7

Distributed filesystem OpenAFS Over 20 years old and well tested Used by large organizations (CERN, DESY, Stanford Univ. and many others) Designed for use over the Internet Replicated read-only content Open source; very active development Available for a broad range of heterogeneous systems including UNIX, Linux, MacOS, Windows, ios Commercial support is available http://openafs.org/ SEITE 8

OpenAFS Uses Kerberos (e.g., Active Directory) for security Federated access through Kerberos trust relations Encryption of network traffic between clients and servers SEITE 9

Contact Dr. Christian Boehme T +49 551 201 1839 F +49 551 201 1576 E christian.boehme@gwdg.de Oliver Schmitt T +49 551 39 20512 F +49 551 201 1576 E oliver.schmitt@gwdg.de GWDG - Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen Am Faßberg 11, 37077 Göttingen http://www.gwdg.de 10