Lustre failover experience



Similar documents
Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

High Performance Computing OpenStack Options. September 22, 2015

CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

Cluster Implementation and Management; Scheduling

High Availability and Backup Strategies for the Lustre MDS Server

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Lustre SMB Gateway. Integrating Lustre with Windows

February, 2015 Bill Loewe

New Storage System Solutions

EXAScaler. Product Release Notes. Version Revision A0

Architecting a High Performance Storage System

Current Status of FEFS for the K computer

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Application Performance for High Performance Computing Environments

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Lustre * Filesystem for Cloud and Hadoop *

HP Proliant BL460c G7

Lustre & Cluster. - monitoring the whole thing Erich Focht

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

What s New with VMware Virtual Infrastructure

Building a Flash Fabric

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

PARALLELS CLOUD STORAGE

高 通 量 科 学 计 算 集 群 及 Lustre 文 件 系 统. High Throughput Scientific Computing Clusters And Lustre Filesystem In Tsinghua University

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

High Performance, Open Source, Dell Lustre Storage System. Dell /Cambridge HPC Solution Centre. Wojciech Turek, Paul Calleja July 2010.

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

How to Choose your Red Hat Enterprise Linux Filesystem

Enhancements of ETERNUS DX / SF

Parallels Cloud Storage

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

The Use of Flash in Large-Scale Storage Systems.

LUSTRE FILE SYSTEM High-Performance Storage Architecture and Scalable Cluster File System White Paper December Abstract

Lessons learned from parallel file system operation

High Availability Solutions for the MariaDB and MySQL Database

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Belgacom Group Carrier & Wholesale Solutions. ICT to drive Your Business. Hosting Solutions. Datacenter Services

Datto Whitepaper: Business Continuity Built from the Ground Up. Business Continuity Built from the Ground Up THE ONLY CONTINUITY

præsentation oktober 2011

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

XenData Product Brief: SX-550 Series Servers for LTO Archives

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

HP reference configuration for entry-level SAS Grid Manager solutions

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

HP Z Turbo Drive PCIe SSD

EonStor DS 3000 Series

Scientific Computing Data Management Visions

short introduction to linux high availability description of problem and solution possibilities linux tools

E4 UNIFIED STORAGE powered by Syneto

System Requirements. Version 8.2 November 23, For the most recent version of this document, visit our documentation website.

The functionality and advantages of a high-availability file server system

Pricing Guide. Overview FD Enterprise License SaaS Packages Dedicated SaaS Shared SaaS. Page 2 Page 3 Page 4 Page 5 Page 8

HGST Virident Solutions 2.0

XenData Product Brief: SX-520 Series Servers for Sony Optical Disc Archives


SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Parallels Plesk Automation

FLOW-3D Performance Benchmark and Profiling. September 2012

Hadoop* on Lustre* Liu Ying High Performance Data Division, Intel Corporation

Sun Constellation System: The Open Petascale Computing Architecture

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013


HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering

Clash of the Titans. I/O System Performance. mag. Sergej Rožman; Abakus plus d.o.o.

HPC Update: Engagement Model

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Object storage in Cloud Computing and Embedded Processing

Optimizing SQL Server Storage Performance with the PowerEdge R720

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Enterprise Edition. Hardware Requirements

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

or later or later or later or later

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

ReadyNAS OS 6 Desktop & Rackmount Performance

Cray DVS: Data Virtualization Service

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

NOTICE ADDENDUM NO. TWO (2) JULY 8, 2011 CITY OF RIVIERA BEACH BID NO SERVER VIRTULIZATION/SAN PROJECT

Scale-Out File Server. Subtitle

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS)

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

A Shared File System on SAS Grid Manger in a Cloud Environment

Scala Storage Scale-Out Clustered Storage White Paper

EonStor DS 3000 Series

Protect SQL Server 2012 AlwaysOn Availability Group with Hitachi Application Protector

Transcription:

Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012

TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2

Who we are Company for technology transfer HPC services Cluster deployment Storage solution Training Sys admin and user oriented programs On-demand HPC 3

Environment Primary research institute in Italy medical research Translational Genomic and Bioinformatics personalized medicine: customization of healthcare by use of genetic information 4

DNA sequencing High-throughput DNA sequencing The $1000 genome meme Next Generation Sequencing is a big data problem! 5

Customer needs analysis PL M SA A N D E SEQ U RA O T S GE ENC ER ACTAG TGTCA ACATGACACA GATAC AAGAC GTTGTCAGAT TAGAC ATGAC T G 6

Customer needs analysis Lot of genomic data from Illumina Hi-Seq 2000 To backup (~20k per run) To post-process Always available Data from the sequencer need to be served to the computational infrastructure Need for a fast, high performance, highly scalable file system, with robust failover and recovery mechanisms 7

Infrastructure DNA E MPL SA InfiniBand QDR SEQ UEN CER A MB A S HA COMPUTING NODES SAN Fibre Channel Gigabit LUSTRE S OS HA S MD HA TAP EL IBR AR Y 8

Lustre filesystem 2 Lustre filesystems 2 MDSs, 3 OSSs ~50 clients 60 terabytes from SAN LUSTRE SH D M A always available! MDS SERVER 2 x Intel E5645@2.4 Ghz 24 GB RAM OSS SERVER 2 x Intel E5645@2.4 Ghz 48 GB RAM MDT OST 9

Lustre high availability Lustre 1 Lustre 2 Lustre clients Lustre servers standby standby 30 TB 30 TB OSS2 OSS3 HIGH AVAILABILITY STACK Hardware MDS1 MDS2 OSS1 10

Lustre high availability Lustre 1 Lustre 2 Lustre clients Lustre servers standby DRBD primary secondary Resource manager crmd sync cib crmd cib PACEMAKER PACEMAKER COROSYNC COROSYNC MDS1 secondary primary stonithd Messaging layer Hardware standby 30 TB 30 TB MDS2 OSS1 OSS2 OSS3 11

SAN provisioning for Lustre OSSs HP P2000 G3 2 controllers, 2xFC 8Gb = 2 TB nearline SAS 7200rpm RAID6 10TB RAID6 10TB OST1 LUSTRE1 OST1 LUSTRE2 OSS1 RAID6 10TB RAID6 OST2 LUSTRE1 10TB OSS2 RAID6 10TB RAID6 OST2 LUSTRE2 10TB RAID6 10TB RAID6 OST3 LUSTRE1 10TB RAID6 10TB RAID6 OST3 LUSTRE2 10TB OSS3 12

High availability on OSSs Failures Power* Fibre channel* InfiniBand* SH OS A Weights distribution, scoring mechanism Each OST has a score with respect to each OSS A OSS mounts an OST when that OST has the highest score on that OSS *both the links! 13

High availability on OSSs OST1 LUSTRE1 OST1 LUSTRE2 OST2 LUSTRE1 OST2 LUSTRE2 OST3 LUSTRE1 OST3 LUSTRE2 OSS1 1000 1000 600 800 800 600 OSS2 800 600 1000 1000 600 800 OSS3 600 800 800 600 1000 1000 OST1 LUSTRE2 OST2 LUSTRE1 OST2 LUSTRE2 OST1 LUSTRE1 OST3 LUSTRE1 OST3 LUSTRE2 OSS1 OSS2 OSS3 14

High availability on OSSs OSS3 fails OST3LUSTRE1, OST3LUSTRE2 -INF score OSS2, OSS1 receive a new OST OST1 LUSTRE1 OST1 LUSTRE2 OST2 LUSTRE1 OST2 LUSTRE2 OST3 LUSTRE1 OST3 LUSTRE2 OSS1 OSS2 OSS3 15

Metadata target MDS1 /standby DRBD sync IPoIB MDS2 /standby MDT1 MDT1 MDT2 MDT2 MGS MGS 1 1 0 1 RA ID 1 0 0 1 HP 100GB 3G SATA MLC LFF (3.5-inch) SC Enterprise Mainstream Solid State Drive PCI-e attached 16

High availability on MDSs Failures Power* InfiniBand* MDS1 /standby HA S MD DRBD sync IPoIB MDS2 /standby MDT1 MDT1 MDT2 MDT2 MGS MGS *both the links! 17

Metadata integrity 18

High availability on MDSs STONITH! HA S MD MDS1 failovers MDS2 takeovers MDS1 /standby DRBD sync IPoIB MDS2 MDT1 MDT2! D E C N E F MGS SPLIT BRAIN!! 19

Lustre network 20

Lustre HA: SW stack CentOS 6.2 Lustre 2.1.1 Lustre-kernel RPM on I/O server Lustre patchless client RPM modules on clients Lustre iokit Shine Pacemaker/corosync 21

High availability tests Unplug failover Power InfiniBand Fibre Channel (on OSS) InfiniBand + Fibre Channel Replug failback Automatic on OSSs Manual on MDSs DOWNTIME = ~120s No automatic split-brain resolution! 22

Performance on OSTs XDD 1 thread Sgpdd-survey 16 threads READ WRITE READ WRITE 2 TB SINGLE DISK 150 150 / / RAID6 (7 DISKS) 400 400 330 590 Results in MB/s 23

Performance on MDTs FILE CREATION OPERATION ON DIRECTORIES 16 threads 64 threads 16 threads 64 threads MDT on SSD 1000 2600 60000 68000 MDT on HD 800 1200 30000 30000 Results in operations per second 24

What's next Upgrade to 2.1.3 (almost) no downtime thanks to HA Monitoring the HA software stack DRBD on MDTs www.exact-lab.it info@exact-lab.it 25