XtreemFS Extreme cloud file system?! Udo Seidel



Similar documents
Ceph. A file system a little bit different. Udo Seidel

XtreemFS a Distributed File System for Grids and Clouds Mikael Högqvist, Björn Kolbeck Zuse Institute Berlin XtreemFS Mikael Högqvist/Björn Kolbeck 1

Cloud storage reloaded:

XtreemFS - a distributed and replicated cloud file system

Data Storage in Clouds

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

BabuDB: Fast and Efficient File System Metadata Storage

Hadoop IST 734 SS CHUNG

Red Hat Cluster Suite

High Availability Solutions for the MariaDB and MySQL Database

Appendix A Core Concepts in SQL Server High Availability and Replication

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

A Comparison of Fault-Tolerant Cloud Storage File Systems

Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage

Apache Hadoop. Alexandru Costan

An Oracle White Paper July Oracle ACFS

Ultimate Guide to Oracle Storage

SUSE Linux uutuudet - kuulumiset SUSECon:sta

Veeam Cloud Connect. Version 8.0. Administrator Guide

THE HADOOP DISTRIBUTED FILE SYSTEM

POWER ALL GLOBAL FILE SYSTEM (PGFS)

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

GlusterFS Distributed Replicated Parallel File System

How To Set Up Egnyte For Netapp Sync For Netapp

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

TimePictra Release 10.0

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Quality of Service (bandwidth limitation): Default is 2 megabits per second.

Postgres Plus xdb Replication Server with Multi-Master User s Guide

The XtreemFS Installation and User Guide Version 1.0

INUVIKA OPEN VIRTUAL DESKTOP FOUNDATION SERVER

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Distributed File Systems

Replicating VNXe3100/VNXe3150/VNXe3300 CIFS/NFS Shared Folders to VNX Technical Notes P/N h REV A01 Date June, 2011

Migrating Your Windows File Server to a CTERA Cloud Gateway. Cloud Attached Storage. February 2015 Version 4.1

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Citrix NetScaler 10 Essentials and Networking

HADOOP MOCK TEST HADOOP MOCK TEST I

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Network Attached Storage. Jinfeng Yang Oct/19/2015

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Server Installation Manual 4.4.1

Enabling Technologies for Distributed Computing

Volume SYSLOG JUNCTION. User s Guide. User s Guide

Chapter 7. Using Hadoop Cluster and MapReduce

Open-Xchange Server High availability Daniel Halbe, Holger Achtziger

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Parallels Cloud Server 6.0

Investigation of storage options for scientific computing on Grid and Cloud facilities

Diagram 1: Islands of storage across a digital broadcast workflow

Architecture and Mode of Operation

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

It should be noted that the installer will delete any existing partitions on your disk in order to install the software required to use BLËSK.

Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Integrating VoltDB with Hadoop

The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

GigaSpaces XAP 10.0 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Introduction to Gluster. Versions 3.0.x

Clustered CIFS For Everybody Clustering Samba With CTDB. LinuxTag 2009

NETASQ ACTIVE DIRECTORY INTEGRATION

Big data management with IBM General Parallel File System

Building Storage Service in a Private Cloud

/ Preparing to Manage a VMware Environment Page 1

GeoGrid Project and Experiences with Hadoop

Replication and Consistency in Cloud File Systems

LDAP User Guide PowerSchool Premier 5.1 Student Information System

Configuring Windows Server Clusters

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Spectrum Scale HDFS Transparency Guide

(Scale Out NAS System)

Course Venue :- Lab 302, IT Dept., Govt. Polytechnic Mumbai, Bandra (E)

The Hadoop Distributed File System

NetApp Storage System Plug-In for Oracle Enterprise Manager 12c Installation and Administration Guide

MCSE Core exams (Networking) One Client OS Exam. Core Exams (6 Exams Required)

Storage Sync for Hyper-V. Installation Guide for Microsoft Hyper-V

VMware vsphere Data Protection

Linux Powered Storage:

WAN Optimization, Web Cache, Explicit Proxy, and WCCP. FortiOS Handbook v3 for FortiOS 4.0 MR3

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Red Hat Storage Server Administration Deep Dive

Enabling Technologies for Distributed and Cloud Computing

HDFS Users Guide. Table of contents

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Transcription:

XtreemFS Extreme cloud file system?! Udo Seidel

Agenda Background/motivation High level overview High Availability Security Summary

Distributed file systems Part of shared file systems family Around for a while back in scope Storage challenges More Faster Cheaper XaaS

Shared file systems family Multiple server access the same data Different approaches Network based, e.g. NFS, CIFS Clustered Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2 Distributed, e.g. Lustre, CephFS, GlusterFS... and XtreemFS

Distributed file systems why? More efficient utilization of distributed hardware Storage CPU/Network Scalability... capacity demands Amount I/O requirements

Distributed file systems which? HDFS (Hadoop) CephFS.. GlusterFS.. RedHat... XtreemFS

History European Research project (2006-2010) Part of XtreemOS Linux based grid O/S Member of OpenGridForum Need of distributed file system

XtreemFS and storage Distributed file system => distributed storage Object base storage approach

Storage looking back Not very intelligent Simple and well documented interface, e.g. SCSI standard Storage management outside the disks

Storage these days Storage hardware powerful => Re-define: tasks of storage hardware and attached computer Shift of responsibilities towards storage Block allocation Space management Storage objects instead of blocks Extension of interface -> OSD standard

Object Based Storage I Objects of quite general nature Files Partitions ID for each storage object Separation of meta data operation and storing file data HA not covered at all Object based Storage Devices

Object Based Storage II OSD software implementation Usual an additional layer between between computer and storage Presents object-based file system to the computer Use a normal file system to store data on the storage Delivered as part of XtreemFS File systems: LUSTRE, EXOFS, CephFS

Implementation I Java Supported O/S Linux MacOS X with manual work Free/Net/OpenBSD? No Windows anymore Server and Client (fuse)... both in user space Non-privileged user

Implementation II IP based Different ports for different XtreemFS services Clear text vs. encrypted Object based storage Software implementation OSD features in XtreemFS code Copy on write Snapshotting

XtreemFS the architecture I 4 components Object based Storage Devices Meta Data and Replica Catalogue Servers Directory Service Clients ;-)

XtreemFS the architecture II

XtreemFS services Several OSD MRC Volumes UUID's Abstraction from network Change requires outage Plans for topology

XtreemFS DIR/MRC data Data stored locally BabuDB Independent of OSD Write buffers ASYNC FSYNC SYNC_WRITE Modus SYNC_WRITE_METADATA Description Asynchronous log entry write Fsync() called after log entry write and before ack'ing of operation Synchronous log entry write, ack'ing of operation before meta data update Synchronous log entry write and meta data update before ack'ing of operation

XtreemFS OSD data File cut in 128 Kbyte pieces Default: entire file on one OSD Distribution across multiple OSD's possible RAID 0 implemented RAID 5 planned Parallel reads/writes

XtreemFS interfaces HTTP Read-only Read-write planned Command line All purposes

XtreemFS interfaces

XtreemFS high level summary Multi-platform Abstraction via UUID Communication separation Freedom of choice of OSD backend file system HPC out of scope

XtreemFS HA in general One part: OSD Replication via policies Other part: MRC and DIR Local data stored in BabuDB's Synchronization via BabuDB methods

XtreemFS HA for MRC/DIR Master/slave Master changes -> log file without buffering Log file entries propagation to slaves Quorum needed => at least 3 instances No automation for DIR Synchronization in clear text Encryption via SSL possible

XtreemFS OSD replication File replication Read-only Since 1.0 Easy to handle Read-write Only since 1.3 Later more Copies Full Partial aka on-demand

XtreemFS r/o replication Arbitrary amount of replicas Equally treated replicas Only OSD local access No sync needed Use case Static files :-) Low bandwidth (partial replica) Big static files (partial replica)

XtreemFS r/w replication Primary/secondary Election on demand with leases Read/write access First primary Propagated to secondaries

XtreemFS r/w replication - failure Secondary Behaviour configurable Write failure vs. Write on remaining Quorum needed Primary Behaviour configurable Write failure vs. Write on remaining Quorum needed

XtreemFS OSD/replica policies OSD selection for new files Replica selection for new/additional copies Categories: filter, group, sort Combination of rules Policy Standard OSD FQDN based UUID based Data center topology random Category filter filter, group, sort filter group, sort sort

XtreemFS HA summary Homework needed for DIR and MRC OSD Lateness of OSD read-write replication OSD Read-only replication Mature and WAN ready Access time improvement via striping Flexibility of policies

XtreemFS encryption Not on file system level For communication Interaction of DIR, MRC and OSD Data replication for HA for DIR and/or MRC

XtreemFS channel encryption Via SSL PCKS#12 or Java Key Store (JKS) Locally stored service/client certificates root CA certificates Two modes All-Or-Nothing approach Grid-SSL just authentication

XtreemFS secure channel encryption Password protection of certificates MRC/DIR/OSD: stored service configuration Client: via CLI!!

XtreemFS encryption summary Data encryption on POSIX layer? SSL obvious choice for TCP/IP channels Missing PKI contradicts scalability Password protection needs re-design

Summary High self-defined goals Some dropped? Some partially implemented Ok for R&D Labs HA and housekeeping improvement needed Encryption w/o PKI

References http://www.xtreemfs.org http://babudb.googlecode.com

Thank you!