Petabyte Scale Out Rocks

Similar documents
Ceph. A file system a little bit different. Udo Seidel

SUSE Linux uutuudet - kuulumiset SUSECon:sta

SUSE Enterprise Storage Highly Scalable Software Defined Storage. Gábor Nyers Sales

SUSE Enterprise Storage Highly Scalable Software Defined Storage. Māris Smilga

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

How To Make A Cloud Work For You

How an Open Source Cloud Will Help Keep Your Cloud Strategy Options Open

Ceph Distributed Storage for the Cloud An update of enterprise use-cases at BMW

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

SUSE Storage. FUT7537 Software Defined Storage Introduction and Roadmap: Getting your tentacles around data growth. Larry Morris

Cloud storage reloaded:

SUSE Cloud 5 Private Cloud based on OpenStack

Sep 23, OSBCONF 2014 Cloud backup with Bareos

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

XtreemFS Extreme cloud file system?! Udo Seidel

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

High Availability Storage

Building Images for the Cloud and Data Center with SUSE Studio

Implementing Linux Authentication and Authorisation Using SSSD

SUSE Customer Center Roadmap

DreamObjects. Cloud Object Storage Powered by Ceph. Monday, November 5, 12

Advanced Systems Management with Machinery

Challenges Implementing a Generic Backup-Restore API for Linux

Product Spotlight. A Look at the Future of Storage. Featuring SUSE Enterprise Storage. Where IT perceptions are reality

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

DevOps and SUSE From check-in to deployment

HO15982 Deploy OpenStack. The SUSE OpenStack Cloud Experience. Alejandro Bonilla. Michael Echavarria. Cameron Seader. Sales Engineer

OpenStack Introduction. November 4, 2015

SUSE OpenStack Cloud 4 Private Cloud Platform based on OpenStack. Gábor Nyers Sales gnyers@suse.com

Software Defined Everything

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

Relax-and-Recover. Johannes Meixner. on SUSE Linux Enterprise 12.

Building low cost disk storage with Ceph and OpenStack Swift

Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Chris Haddad

Ceph. A complete introduction.

Deploying Hadoop with Manager

Storage Virtualization in Cloud

VM Image Hosting Using the Fujitsu* Eternus CD10000 System with Ceph* Storage Software

Operating System Security Hardening for SAP HANA

Oracle Products on SUSE Linux Enterprise Server 11

Running SAP HANA One on SoftLayer Bare Metal with SUSE Linux Enterprise Server CAS19256

Novell Cloud Security Service Reducing Risk by Securing the Cloud. Stefan Stiehl Senior Sales Technology Specialist

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Installing, Tuning, and Deploying Oracle Database on SUSE Linux Enterprise Server 12 Technical Introduction

Btrfs and Rollback How It Works and How to Avoid Pitfalls

Mobile Cloud Computing T Open Source IaaS

Public Cloud. Build, Use, Manage. Robert Schweikert. Public Cloud Architect

Configuration Management in SUSE Manager 3

Best Practices for Increasing Ceph Performance with SSD

Technical Overview Simple, Scalable, Object Storage Software

Workflow und Identity Management - Genehmigungsprozesse, Role Mining, Role Design und Compliance Management

SUSE Linux Enterprise 12 Security Certifications Common Criteria, EAL, FIPS, PCI DSS,... What's All This About?

We are watching SUSE

Open Source High Availability Writing Resource Agents for your own services. Lars Marowsky-Brée Team Lead SUSE Labs

Novell Collaboration Vibe OnPrem

Building Storage-as-a-Service Businesses

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

High Availability and Disaster Recovery for SAP HANA with SUSE Linux Enterprise Server for SAP Applications

KVM, OpenStack and the Open Cloud SUSECon November 2015

TUT8155 Best Practices: Linux High Availability with VMware Virtual Machines

The Design and Implementation of the Zetta Storage Service. October 27, 2009

An Introduction to OpenStack and its use of KVM. Daniel P. Berrangé

Acronis Storage Gateway

Introducing ScienceCloud

Cloud Computing #8 - Datacenter OS. Johan Eker

SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment

Introduction to OpenStack

Proste zarządzanie setkami serwerów SUSE Manager 2.1 I SUSE Virtualization Enterprise Console

Case Study: University of Dayton and Novell Identity & Security Solutions Rick Wagner

Wicked A Network Manager Olaf Kirch

File Management Suite. Novell. Intelligently Manage File Storage for Maximum Business Benefit. Sophia Germanides

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Getting Started with Database As a Service on OpenStack

Using SUSE Linux Enterprise to "Focus In" on Retail Optical Sales

RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

How SUSE Is Helping You Rock The Public Cloud

WHITE PAPER. Software Defined Storage Hydrates the Cloud

GRAU DATA Scalable OpenSource Storage CEPH, LIO, OPENARCHIVE

cloud functionality: advantages and Disadvantages

Block Storage in the Open Source Cloud called OpenStack

High Performance Computing OpenStack Options. September 22, 2015

White Paper. Deploying and Provisioning Databases in the Cloud: How Tesora s Database as a Service (DBaaS) Platform is Transforming Enterprise IT

CAS18543 Migration from a Windows Environment to a SUSE Linux Enterprise based Infrastructure Liberty Christian School

Kangaroot SUSE TechUpdate Interoperability SUSE Linux Enterprise and Windows

Benchmarking Hadoop performance on different distributed storage systems

Using btrfs Snapshots for Full System Rollback

2) Xen Hypervisor 3) UEC

RED HAT STORAGE PORTFOLIO OVERVIEW

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Adrian Otto,

OpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013

Based on Geo Clustering for SUSE Linux Enterprise Server High Availability Extension

東 海 大 學 資 訊 工 程 研 究 所 碩 士 論 文

Of Pets and Cattle and Hearts

Introduction to Gluster. Versions 3.0.x

SMB in the Cloud David Disseldorp

Transcription:

Petabyte Scale Out Rocks CEPH as Replacement for Openstack's Swift and Co. Dr. Udo Seidel Manager of Linux Strategy @ Amadeus useidel@amadeus.com

Agenda Background Architecture of CEPH CEPH Storage OpenStack Dream Team: CEPH and Openstack Summary 2

Background

Me :-) Teacher of mathematics and physics PhD in experimental physics Started with Linux in 1996 Linux/UNIX trainer Solution engineer in HPC and CAx environment Head of the Linux Strategy team @Amadeus 4

CEPH What? So-called parallel distributed cluster file system Started as part of PhD studies at UCSC Public announcement in 2006 at 7th OSDI File system shipped with Linux kernel since 2.6.34 Name derived from pet octopus cephalopods 5

Shared File Systems Short Intro Multiple server access the same data Different approaches Network based, e.g. NFS, CIFS Clustered Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2 Distributed parallel, e.g. Lustre.. and CEPH 6

Architecture of CEPH

CEPH and Storage Distributed file system => distributed storage Does not use traditional disks or RAID arrays Does use so-called OSDs Object based Storage Devices Intelligent disks 8

Object Based Storage I Objects of quite general nature Files Partitions ID for each storage object Separation of metadata operation and storing file data HA not covered at all Object based Storage Devices 9

Object Based Storage II OSD software implementation Usual an additional layer between between computer and storage Presents object-based file system to the computer Use a normal file system to store data on the storage Delivered as part of CEPH File systems: Lustre, EXOFS 10

CEPH The Full Architecture I 4 Components Object-based Storage Devices Any computer Form a cluster (redundancy and load balancing) Metadata Servers Any computer Form a cluster (redundancy and load balancing) Cluster Monitors Any computer Clients ;-) 11

CEPH The Full Architecture II CEPH Clients MDS Cluster Meta Data Operation Metadata I/O Data I/O User-space Linux kernel User POSIX VFS CEPH OSD Cluster 12

CEPH Storage

CEPH and OSD Userland implementation Any computer can act as OSD Uses XFS or BTRFS as native file system Since 2009 Before self-developed EBOFS Provides functions of OSD-2 standard Copy-on-write Snapshots No redundancy on disk or even computer level 14

CEPH and OSD File Systems XFS strongly recommended BTRFS planned as default Non-default configuration for mkfs EXT4 possible XATTR (size) is key -> EXT4 less recommended Leveldb to bridge existing gaps 15

OSD Failure Approach Any OSD expected to fail New OSD dynamically added/integrated Data distributed and replicated Redistribution of data after change in OSD landscape 16

Data Distribution File stripped File pieces mapped to object IDs Assignment of so-called placement group to object ID Via hash function Placement group (PG): logical container of storage objects Calculation of list of OSDs out of PG CRUSH algorithm 17

CRUSH I Controlled Replication Under Scalable Hashing (CRUSH) Considers several information Cluster setup/design Actual cluster landscape/map Placement rules Pseudo random -> quasi statistical distribution Cannot cope with hot spots Clients, MDS and OSD can calculate object location 18

CRUSH II File Objects... (ino, ono) oid hash(oid) pgid PGs... CRUSH(pgid) (osd1, osd2) OSDs 19

Data Replication N-way replication N OSDs per placement group OSDs in different failure domains First non-failed OSD in PG -> primary Read and write to primary only Writes forwarded by primary to replica OSDs Final write commit after all writes on replica OSD Replication traffic within OSD network 20

CEPH clustermap I Objects: computers and containers Container: bucket for computers or containers Each object has ID and weight Maps physical conditions Rack location Fire cells 21

CEPH clustermap II Reflects data rules Number of copies Placement of copies Updated version sent to OSDs OSDs distribute cluster map within OSD cluster OSD re-calculates via CRUSH PG membership Data responsibilities Order: primary or replica New I/O accepted after information sync 22

CEPH - RADOS Reliable Autonomic Distributed Object Storage (RADOS) Direct access to OSD cluster via librados Supported: C, C++, Java, Python, Ruby, PHP Drop/skip of POSIX layer (CEPHFS) on top Visible to all CEPH members => shared storage 23

CEPH Block Device Aka RADOS Block Device (RBD) Second part of the kernel code (since 2.6.37) RADOS storage exposed as block device (dev/rbd) qemu/kvm storage driver via librados/librdb Alternative to: Shared SAN/iSCSI for HA environments Storage HA solutions for qemu/kvm and Xen Shipped with SUSE Cloud 24

RADOS A Part of the Picture dev/rbd qemu/kvm RBD protocol CEPH-osd 25

CEPH Object Gateway (RGW) Aka RADOS Gateway RESTful API Amazon S3 -> s3 tools work OpenStack SWIFT interface Proxy HTTP to RADOS Tested with apache and lighthttpd 26

CEPH Takeaways Scalable Flexible configuration No SPOF, but based on commodity hardware Accessible via different interfaces 27

OpenStack

What? Infrastructure as as Service (IaaS) 'Open source version' of AWS :-) New versions every 6 months Current (2013.1.4) is called Havana Previous (2013.1.3) is called Grizzly Managed by OpenStack Foundation 29

30 OpenStack Architecture

OpenStack Components Keystone - identity Glance - image Nova - compute Cinder - block storage Swift - object storage Quantum - network Horizon - dashboard 31

About Glance There since almost the beginning OpenStack's Image store Server Disk Several formats possible Raw Qcow2... Shared/non-shared 32

About Cinder OpenStack's block storage Quite new Part of nova before Separated since Folsom For compute instances Managed by OpenStack users Several storage back-ends possible 33

About Swift Since the beginnig :-) Replace Amazon S3 -> cloud storage Scalable Redundant OpenStack's object store Proxy Object Container Account Auth 34

Dream Team: CEPH and OpenStack

Why CEPH in the first place? Full blown storage solution Support Operational model Ready for 'cloud' type Separation of duties One storage solution for different storage needs Open source! 36

Integration Full integration with RADOS since Folsom Two parts Authentication Technical access Both parties must be aware CEPH components installed in OpenStack environment Independent for each of the storage components Different RADOS pools 37

Integration Authentication CEPH part Key rings (ceph auth) Configuration (ceph.conf) For cinder and glance OpenStack part Keystone Only for Swift Needs RadosGW 38

Integration Access to RADOS/RBD I Via API/libraries => no CEPHFS needed Easy for Glance/Cinder Glance CEPH keyring configuration Update of ceph.conf Update of Glance API configuration Cinder CEPH keyring configuration Update of ceph.conf Update of Cinder API configuration 39

Integration Access to RADOS/RBD II More work needed for Swift Again: CEPHFS not needed CEPH Object Gateway Webserver RGW software Keystone certificates Does not support v2 of Swift authentication Keystone authentication Endlist configuration -> to RADOSGW 40

Integration The Full Picture Keystone Swift Glance Cinder Nova qemu/kvm CEPH Object Gateway CEPH Block Device RADOS 41

Integration Pitfalls CEPH versions not in sync Authentication CEPH Object Gateway setup 42

Why CEPH Reviewed Previous arguments still valid :-) Modular usage High integration Built-in high availability One way to handle the different storage entities No need for POSIX compatible interface Works even with other IaaS implementations 43

Summary

Takeaways Sophisticated storage engine Mature OSS distributed storage solution Other use cases Can be used elsewhere Full integration in OpenStack 45

Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

Backup Slides

CEPHFS File System Part of CEPH Replacement of NFS or other DFS Storage just a part CEPH.ko CEPH-fuse libcephfs CEPH DFS protocol CEPH-mds CEPH-osd 49

CEPHFS Client View The first kernel part of CEPH Unusual kernel implementation light code Almost no intelligence Communication channels To MDS for meta data operation To OSD to access file data 50

CEPHFS Caches Per design OSD: Identical to access of BTRFS Client: own caching Concurrent write access Caches discarded Caching disabled -> synchronous I/O HPC extension of POSIX I/O O_LAZY 51

CEPH and Meta Data Server (MDS) MDS form a cluster don t store any data Data stored on OSD Journaled writes with cross MDS recovery Change to MDS landscape No data movement Only management information exchange Partitioning of name space Overlaps on purpose 52

Dynamic sub tree Partitioning Weighted subtrees per MDS load of MDS re-belanced MDS1 MDS2 MDS3 MDS4 MDS5 53

Meta Data Management Small set of meta data No file allocation table Object names based on inode numbers MDS combines operations Single request for readdir() and stat() stat() information cached 54

CEPH Monitors Status information of CEPH components critical First contact point for new clients Monitor track changes of cluster landscape Update cluster map Propagate information to OSDs 55

CEPH Storage All In One CEPH.ko CEPH-fuse libcephfs /dev/rbd qemu/kvm CEPH DFS protocol RBD protocol CEPH-mds CEPH Object Gateway librados CEPH-osd 56