Reliable Replicated File Systems with GlusterFS



Similar documents
An Introduction To Gluster. Ryan Matteson

Load Balancing and High availability using CTDB + DNS round robin

GlusterFS Distributed Replicated Parallel File System

INUVIKA TECHNICAL GUIDE

InterWorx Clustering Guide. by InterWorx LLC

Introduction to Highly Available NFS Server on scale out storage systems based on GlusterFS

Red Hat Storage Server Administration Deep Dive

Introduction to Gluster. Versions 3.0.x

Gluster Filesystem 3.3 Beta 2 Hadoop Compatible Storage

October Gluster Virtual Storage Appliance User Guide

NFS Ganesha and Clustered NAS on Distributed Storage System, GlusterFS. Soumya Koduri Meghana Madhusudhan Red Hat

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

High Availability Storage

Open Source, Scale-out clustered NAS using nfs-ganesha and GlusterFS

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Lustre SMB Gateway. Integrating Lustre with Windows

Advanced Linux System Administration on Red Hat

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

XtreemFS Extreme cloud file system?! Udo Seidel

Managing your Domino Clusters

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

Clustered CIFS For Everybody Clustering Samba With CTDB. LinuxTag 2009

PolyServe Understudy QuickStart Guide

Develop a process for applying updates to systems, including verifying properties of the update. Create File Systems

Ingres High Availability Option

Syncplicity On-Premise Storage Connector

Building Storage Service in a Private Cloud

PARALLELS SERVER 4 BARE METAL README

SolarWinds Log & Event Manager

Virtuozzo 7 Technical Preview - Virtual Machines Getting Started Guide

Distributed File Systems

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Linux Powered Storage:

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Preparing Your Computer for LFS101x. July 11, 2014 A Linux Foundation Training Publication

RAID Utility User s Guide Instructions for setting up RAID volumes on a computer with a MacPro RAID Card or Xserve RAID Card.

Frequently Asked Questions: EMC UnityVSA

High Availability Solutions for the MariaDB and MySQL Database

MySQL and Virtualization Guide

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Moving Virtual Storage to the Cloud

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Acronis Storage Gateway

Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide

Real-time Protection for Hyper-V

Parallels Cloud Server 6.0 Readme

Spectrum Scale. Problem Determination. Mathias Dietz

Active-Active ImageNow Server

ntier Verde: Simply Affordable File Storage No previous storage experience required

Parallels Cloud Server 6.0

Frequently Asked Questions (FAQ)

PostgreSQL Clustering with Red Hat Cluster Suite

PARALLELS CLOUD STORAGE

Parallels Cloud Storage

High Availability Low Dollar Clustered Storage

Windows Template Creation Guide. How to build your own Windows VM templates for deployment in Cloudturk.

ovirt and Gluster Hyperconvergence

How to Choose your Red Hat Enterprise Linux Filesystem

High Availability Databases based on Oracle 10g RAC on Linux

Data Storage in Clouds

Acronis Disk Director 11 Advanced Server. Quick Start Guide

CISCO CLOUD SERVICES PRICING GUIDE AUSTRALIA

Asterisk SIP Trunk Settings - Vestalink

Availability Digest. Redundant Load Balancing for High Availability July 2013

2. RAID Management RAID migration N5200 allows below RAID migration cases.

Virtual Private Servers

PARALLELS SERVER BARE METAL 5.0 README

Ceph. A complete introduction.

Release Notes. LiveVault. Contents. Version Revision 0

Deploying a Virtual Machine (Instance) using a Template via CloudStack UI in v4.5.x (procedure valid until Oct 2015)

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

Red Hat Enterprise linux 5 Continuous Availability

Deployment - post Xserve

HRG Assessment: Stratus everrun Enterprise

Maginatics Cloud Storage Platform Feature Primer

RAID Utility User Guide. Instructions for setting up RAID volumes on a computer with a Mac Pro RAID Card or Xserve RAID Card

Installation Runbook for F5 Networks BIG-IP LBaaS Plugin for OpenStack Kilo

CS197U: A Hands on Introduction to Unix

Using New Relic to Monitor Your Servers

Scala Storage Scale-Out Clustered Storage White Paper

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Storage Virtualization in Cloud

Prepared for: How to Become Cloud Backup Provider

Red Hat Ceph Storage Hardware Guide

Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud

Nutanix NOS 4.0 vs. Scale Computing HC3

Net/FSE Installation Guide v1.0.1, 1/21/2008

SAM XFile. Trial Installation Guide Linux. Snell OD is in the process of being rebranded SAM XFile

Release Notes for Fuel and Fuel Web Version 3.0.1

RED HAT STORAGE SERVER TECHNICAL OVERVIEW

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Transcription:

John Sellens jsellens@syonex.com @jsellens USENIX LISA 28, 2014 November 14, 2014 Notes PDF at http://www.syonex.com/notes/

Contents Preamble and Introduction 2 Setting Up GlusterFS Servers 8 Mounting on Clients 20 Managing, Monitoring, Fixing 25 Wrap Up 33 c 2014 John Sellens USENIX LISA 28, 2014 1

Preamble and Introduction Preamble and Introduction c 2014 John Sellens USENIX LISA 28, 2014 2

Preamble and Introduction Overview Network Attached Storage is handy to have in many cases And sometimes we have limited budgets GlusterFS provides a scalable NAS system On normal systems and hardware An introduction to GlusterFS and its uses And how to implement and maintain a GlusterFS file service c 2014 John Sellens USENIX LISA 28, 2014 3 http://www.gluster.org/ We re not going to cover everything in this Mini Tutorial session But it should get you started In time for mid-afternoon break! Both USENIX and I will very much appreciate your feedback please fill out the evaluation form

Preamble and Introduction Solving a Problem Needed to replace a small but reliable network file service Expanding the existing service wasn t going to work Wanted something comprehensive but comprehensible Needed Posix filesystem semantics, and NFS Wanted something that would let me sleep at night GlusterFS seemed a good fit Supported by RedHat, NFS, CIFS,... User space, on top of regular filesystem c 2014 John Sellens USENIX LISA 28, 2014 4 I have a small hosting infrastructure that I like to implement reliably Red Hat Storage Server is a supported GlusterFS implementation

Preamble and Introduction Alternatives I Was Less Enthused About Block replication DRBD, HAST Not transparent hard to look and confirm consistency Hard to expand, Limited to two server nodes Object stores Ceph, Hadoop, etc. No need for shared block devices for KVMs, etc Not always Posix and NFS Others MooseFS, Lustre, etc. Some needed separate meta-data server(s) Some had single master servers c 2014 John Sellens USENIX LISA 28, 2014 5 I was running HAST on FreeBSD, and tried (and failed) to expand it Partly due to old hardware I was using

Preamble and Introduction Why I Like GlusterFS Can run on just two servers all functions on both Sits on top of a standard filesystem (ext3, xfs) Files in GlusterFS volumes are visible as normal files So if everything fails very badly, I can likely copy the files out Easy to compare replicated copies of files for consistency Fits nicely with CentOS which I tend to use NFS server support means that my existing FreeBSD boxes would work just fine c 2014 John Sellens USENIX LISA 28, 2014 6 I like to be both simple-minded and paranoid So being able to check and copy if need be was appealing

Preamble and Introduction Hardware Don t Use Your Old Junk I have some old 32-bit machines Bad, bad idea These days, code doesn t seem to be tested well on 32 bit GlusterFS inodes (or equivalent) are 64 bits Which doesn t sit well with 32 bit NFS clients In theory 32 bit should work, in practice it s at least annoying 2 6 Yes! but 2 5 No! c 2014 John Sellens USENIX LISA 28, 2014 7 This is not just GlusterFS related My old 32 bit FreeBSD HAST systems started misbehaving when I tried to update and expand

Setting Up GlusterFS Servers Setting Up GlusterFS Servers c 2014 John Sellens USENIX LISA 28, 2014 8

Setting Up GlusterFS Servers Set Up Some Servers Ordinary servers with ordinary storage All the normal speed/reliability questions I ll suggest CentOS 7 (or 6) Leave unallocated space to use for GlusterFS Separate storage network? Traffic and security Dedicated servers for storage? Likely want storage servers to be static and dedicated c 2014 John Sellens USENIX LISA 28, 2014 9 Since RedHat does the development, it s pretty likely that GlusterFS will work well on CentOS Should work on Fedora and Debian as well, if you re that way inclined GlusterFS 3.6 likely to have FreeBSD and MacOS support (I hope) https://forums.freebsd.org/viewtopic.php?t=46923 And of course, it should go without saying, but make sure NTP and DNS and networking are working properly.

Setting Up GlusterFS Servers RAID on the Servers? GlusterFS hardware failures should be non-disruptive RAID should provide better I/O performance Especially hardware RAID with cache Re-building/silvering an entire server for a disk failure is boring Overall storage performance will suffer in the meantime A second failure might be a big problem Small general purpose deployment? Use good servers and suitable RAID Other situations may suit non-raid Lots of servers, more than 2 replicas, etc. c 2014 John Sellens USENIX LISA 28, 2014 10 Configuration management should mean that a server rebuild is easy Your mileage may vary Remember that a failed disk means lots of I/O and time to repair, and you re vulnerable to other failures while rebuilding

Setting Up GlusterFS Servers Networks and Security GlusterFS has limited security and access controls Assumption: all servers and networks are friendly A separate storage network may be prudent glusterfs mounts need to reach gluster peer addresses NFS mounts by default are available on all interfaces Generally you want to isolate GlusterFS traffic if you can Firewalls, subnets, iptables,... c 2014 John Sellens USENIX LISA 28, 2014 11 I have very limited experience trying to contain GlusterFS If you re using only glusterfs mounts an isolated network would be useful For performance and containment

Setting Up GlusterFS Servers IPs and Addressing Generally you will want fixed and floating addresses GlusterFS peers need to talk to each other glusterfs mounts need to find one peer then talk to the others First peer provides details of the volumes and peers NFS and CIFS mounts want floating service addresses Active/passive mounts need just one Active/active mounts need more CTDB is recommended for IP address manipulation c 2014 John Sellens USENIX LISA 28, 2014 12 With two servers, I have 6 addresses total Management addresses Storage network peer addresses Floating addresses that are normally one per server More on CTDB later on slide 19

Setting Up GlusterFS Servers Installing GlusterFS Use the standard gluster.org repositories See notes Install with yum install glusterfs-server service glusterd start chkconfig glusterd on or apt-get install glusterfs-server Current version is 3.6.1 c 2014 John Sellens USENIX LISA 28, 2014 13 Versions use 3.5.x I seemed to have less reliable/stable behaviour with 3.4 Everything is under the download link at http://download.gluster.org/pub/gluster/glusterfs/latest/ CentOS: wget -P /etc/yum.repos.d \ http://download.gluster.org/pub/gluster/ \ glusterfs/latest/centos/glusterfs-epel.repo Debian see http://download.gluster.org/pub/gluster/ \ glusterfs/3.5/latest/debian/wheezy/readme

Setting Up GlusterFS Servers A Little Terminology A set of GlusterFS servers is a Trusted Storage Pool Members of a pool are peers of each other A GlusterFS filesystem is a Volume Volumes are composed of storage Bricks Volumes can be three types, and most combinations Distributed different files are on different bricks Striped (very large) files are split across bricks Replicated two or more copies on different bricks Distributed Replicated more servers than replicas A Sub-Volume is a replica set within a Volume c 2014 John Sellens USENIX LISA 28, 2014 14 Distributed provides no redundancy Though you might have RAID disks on servers But you re still in trouble if a server goes down

Setting Up GlusterFS Servers Set Up the Peers All servers in a pool need to know each other node1# gluster peer probe node2 Doesn t hurt to do this (I think it s optional) node2# gluster peer probe node1 And make sure they are talking: node1# gluster peer status That only lists the other peer(s) List the servers in a pool node1# gluster pool list c 2014 John Sellens USENIX LISA 28, 2014 15

Setting Up GlusterFS Servers Set Us Up the Brick A brick is just a directory in an OS filesystem One brick per filesystem Disk storage dedicated to a volume /data/gluster/volname/brickn/brick Could have multiple bricks in a filesystem Disk storage shared between volumes /data/gluster/disk1/volname/brickn Don t want a brick to be a filesystem mount point Big problems if underlying storage not mounted Multiple volumes? Use the latter for better utilization c 2014 John Sellens USENIX LISA 28, 2014 16 XFS is the suggested filesystem to use A suggested naming convention for bricks: http://www.gluster.org/community/documentation/ index.php/howtos:brick_naming_conventions With disk mount points, and multiple bricks per OS filesystem, one GlusterFS volume can use up space and fill up other volumes With multiple bricks per OS filesystem, it s harder to know which gluster volume is using up space df shows the same for all volumes Depends on your use case One big volume or multiple volumes for different purposes Will volumes shrink, or only grow? Is it convenient to have multiple OS disk partitions?

Setting Up GlusterFS Servers Sizing Up a Brick How big should a brick (partition) be? One brick using all space on a server is easy to create But harder to move or replace if needed Consider using bricks of manageable size e.g. 500GB, 1TB Will likely be easier to migrate/replace if needed Of course, if you have a lot of storage, a zillion bricks might be difficult Keep more space free than is on any one server? c 2014 John Sellens USENIX LISA 28, 2014 17 I think there are some subtleties here that aren t quite so obvious And might be worth a thought or two before you commit yourself to a storage layout that will be hard to change

Setting Up GlusterFS Servers Create a Volume Volume creation is straightforward node1# gluster volume create vol1 replica 2 \ node1:/data/glusterfs/disk1/vol1/brick1 \ node2:/data/glusterfs/disk1/vol1/brick1 \ node1:/data/glusterfs/disk2/vol1/brick2 \ node2:/data/glusterfs/disk2/vol1/brick2 node1# gluster volume start node1# gluster volume info vol1 node1# mount -t glusterfs localhost:/vol1 /mnt node1# showmount -e node2 Replicas are across the first two bricks, and next two Name things sensibly now, save your brain later c 2014 John Sellens USENIX LISA 28, 2014 18 Each brick will now have a.glusterfs directory Adding files or directories to the volume causes them to show up in the bricks of one of the replicated pairs You can look, but do not touch Only change a volume through a mount Never my modifying a brick directly Likely best to stick with the built-in NFS server You can set options on a volume with gluster volume set volname option value If you re silly (like me) and have 32 bit NFS clients: gluster volume set volname \ nfs.enable-ino32 on

Setting Up GlusterFS Servers IP Addresses and CTDB CTDB is a clustered TDB database built for Samba Includes IP address failover Set up CTDB on each node /etc/ctdb/nodes Manage public IPs /etc/ctdb/public_addresses Needs a shared private directory for locks, etc. Starts/stops Samba Active/active with DNS round robin c 2014 John Sellens USENIX LISA 28, 2014 19 Setup is fairly easy follow these pages http://www.gluster.org/community/ documentation/index.php/ctdb http://wiki.samba.org/index.php/ctdb_setup http://ctdb.samba.org/

Mounting on Clients Mounting on Clients c 2014 John Sellens USENIX LISA 28, 2014 20

Mounting on Clients Native Mount or NFS? Many small files, mostly read? e.g. a web server? Use NFS client Write heavy load? Use native gluster client Client not Linux? Use NFS client Or CIFS if Windows client c 2014 John Sellens USENIX LISA 28, 2014 21 http://www.gluster.org/documentation/technical_faq/

Mounting on Clients Gluster Native Mount Install glusterfs-fuse or glusterfs-client client# mount -t glusterfs ghost:/vol1 /mnt Use a public/floating IP/hostname for the mount Gluster client gets volume info Then uses the peer names used when adding bricks So a gluster client must have access to the storage network Client handles if nodes disappear c 2014 John Sellens USENIX LISA 28, 2014 22 mount.glusterfs(8) does not mention all the mount options In particular, the option backupvolfile-server=node2 might be useful, if you don t use public/floating IPs

Mounting on Clients NFS Mount Like any other NFS mount client# mount glusterhost:/vol1 /mnt Use a public/floating IP/hostname for the mount NFS talks to that IP/hostname So an NFS client need not have access to the storage network NFS must use TCP, not UDP Failover should be handled by CTDB IP switch But a planned outage might pre-plan and adjust the mount c 2014 John Sellens USENIX LISA 28, 2014 23

Mounting on Clients Similar to NFS mounts CIFS Mounts Use public/floating IP s name Need to configure Samba as appropriate on the servers clustering = yes idmap backend = tdb2 private dir = /gluster/shared/lock CTDB will start/stop Samba c 2014 John Sellens USENIX LISA 28, 2014 24

Managing, Monitoring, Fixing Managing, Monitoring, Fixing c 2014 John Sellens USENIX LISA 28, 2014 25

Managing, Monitoring, Fixing Ongoing Management When all is going well, there s not much to do Monitor filespace usage and other normal things Gluster monitoring check for Processes running All bricks connected Free space Volume heal info Lots of logs in /var/log/glusterfs Note well: GlusterFS, like RAID, is not a backup c 2014 John Sellens USENIX LISA 28, 2014 26 I use check_glusterfs by Mark Ruys, mark.ruys@peercode.nl http://exchange.nagios.org/directory/plugins/ System-Metrics/File-System/GlusterFS-checks/details I run it as root via SNMP Unsynced entries (from heal info) are normally 0, but when busy there can be transitory unsynced entries My gluster volumes are not heavy write You may see more unsynced

Managing, Monitoring, Fixing Command Line Stuff The gluster command is the primary tool node1# gluster volume info vol1 node1# gluster volume log rotate vol1 node1# gluster volume status vol1 node1# gluster volume heal vol1 info node1# gluster help The volume heal subcommands provide info on consistency And can trigger a heal action c 2014 John Sellens USENIX LISA 28, 2014 27

Managing, Monitoring, Fixing Adding More Space Expanding the underlying filesystem provides more space But likely want to keep things consistent across servers And of course you can add bricks node1# gluster volume add-brick vol1 \ node1:/path/brick2 node2:/path/brick2 node1# gluster volume rebalance vol1 start Note that you must add bricks in multiple of replica count Each new pair is a replica pair, just like for create Increase replica count by setting new count and adding enough bricks c 2014 John Sellens USENIX LISA 28, 2014 28 If you have a replica with bricks of different sizes, you may be wasting space You don t have to add-brick on a particular node, any server that knows about the volume should likely work fine I m just a creature of habit But you can t reduce the replica count... At least, I don t think you can reduce the replica count A rebalance could be useful if file deletions have left bricks (sub-volumes) unbalanced

Managing, Monitoring, Fixing Removing Space Remove bricks with start, status, commit node1# gluster volume remove-brick vol1 \ node1:/path/brick1 node2:/path/brick1 start Replace start with status for progress When complete, run commit For replicated volumes, you have to remove all the bricks of a sub-volume at the same time c 2014 John Sellens USENIX LISA 28, 2014 29 This of course is never needed, because space needs never decrease

Managing, Monitoring, Fixing Replacing or Moving a Brick Move a brick with replace-brick node1# gluster volume replace-brick vol1 \ node1:/path/brick1 node2:/path/brick1 start Start, status, commit like remove-brick If you re adding a third server to a pool with replicas Should be able to shuffle bricks to the desired result Or, if there s extra space, add and remove bricks If a brick is dead, you may need commit force With RAID, this is less of a problem... c 2014 John Sellens USENIX LISA 28, 2014 30 The Red Hat manual suggests that this is much more complicated This is a nice description of adding a third server http://joejulian.name/blog/ how-to-expand-glusterfs-replicated-clusters-by-one-server/

Managing, Monitoring, Fixing Taking a Node Out of Service In theory it should be simple node1# ctdb disable node1# service gluster stop In practice, you might want to manually move NFS clients first Clients with native gluster mounts should be just fine On restart, volumes should self-heal c 2014 John Sellens USENIX LISA 28, 2014 31 I m paranoid about time for an NFS client to notice a new server

Managing, Monitoring, Fixing Split Brain Problems With multiple servers (more than 2), useful to set node1# gluster volume set all \ cluster.server-quorum-ratio 51% node1# gluster volume set VOLNAME \ cluster.server-quorum-type server With two nodes, could add a 3rd dummy node with no storage If heal info reports unsync d entries node1# gluster volume heal VOLNAME Sometimes a client-side stat of affected file can fix things Or a copy and move back c 2014 John Sellens USENIX LISA 28, 2014 32 Default quorum ratio is more than 50 Or so the docs seem to say The Red Hat Storage Administration Guide has a nice discussion And lots of details on recovery Fixing split brain: https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md Remember: do not modify bricks directly!

Wrap Up Wrap Up c 2014 John Sellens USENIX LISA 28, 2014 33

Wrap Up We Haven t Talked About GlusterFS has many features and options Snapshots Geo-Replication Object storage OpenStack Storage (Swift) Quotas c 2014 John Sellens USENIX LISA 28, 2014 34 We ve tried to hit the key areas to get started with Gluster We didn t cover everything Hopefully you ve learned some of the more interesting aspects And can apply them in your own implementations

Wrap Up Where to Get Gluster Help gluster.org web site has a lot of links Mailing lists, IRC,... Quick Start Guide Red Hat Storage documentation is pretty good HowTo page GLusterFS Administrator Guide c 2014 John Sellens USENIX LISA 28, 2014 35 GlusterFS documentation is currently a bit disjointed http://www.gluster.org/ http://www.gluster.org/documentation/quickstart/index.html Administrator Guide is currently a link to a github repository of markdown files https://access.redhat.com/documentation/en-us/red_hat_storage/3/ http://www.gluster.org/documentation/howto/howto/

Wrap Up And Finally! Please take the time to fill out the tutorial evaluations The tutorial evaluations help USENIX offer the best possible tutorial programs Comments, suggestions, criticisms gratefully accepted All evaluations are carefully reviewed, by USENIX and by the presenter (me!) Feel free to contact me directly if you have any unanswered questions, either now, or later: jsellens@syonex.com Questions? Comments? Thank you for attending! c 2014 John Sellens USENIX LISA 28, 2014 36 Thank you for taking this tutorial, and I hope that it was (and will be) informative and useful for you. I would be very interested in your feedback, positive or negative, and suggestions for additional things to include in future versions of this tutorial, on the comment form, here at the conference, or later by email.