A Filesystem Layer Data Replication Method for Cloud Computing

Similar documents

Products for the registry databases and preparation for the disaster recovery

Storage Architectures for Big Data in the Cloud

SAN Conceptual and Design Basics

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Can High-Performance Interconnects Benefit Memcached and Hadoop?

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

In Memory Accelerator for MongoDB

Network Attached Storage. Jinfeng Yang Oct/19/2015

Client-aware Cloud Storage

Choosing Storage Systems

PARALLELS CLOUD STORAGE

Building a Flash Fabric

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Cray DVS: Data Virtualization Service

Mirror File System for Cloud Computing

Scientific Computing Data Management Visions

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Big data management with IBM General Parallel File System

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

High Availability Solutions for the MariaDB and MySQL Database

Introduction to Gluster. Versions 3.0.x

Backup and Recovery 1

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

IBM System x SAP HANA

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

short introduction to linux high availability description of problem and solution possibilities linux tools

Flexible Storage Allocation

Secure Web. Hardware Sizing Guide

Optimizing Data Center Networks for Cloud Computing

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Performance Benchmark for Cloud Block Storage

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

PRIMERGY server-based High Performance Computing solutions

SAP HANA Operation Expert Summit BUILD - High Availability & Disaster Recovery

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Hadoop Size does Hadoop Summit 2013

A Packet Forwarding Method for the ISCSI Virtualization Switch

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

Enabling High performance Big Data platform with RDMA

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

SMB Direct for SQL Server and Private Cloud

Quantum StorNext. Product Brief: Distributed LAN Client

Replacing SAN with High Performance Windows Share over a Converged Network

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

High Availability and Backup Strategies for the Lustre MDS Server

Exploring Amazon EC2 for Scale-out Applications

Running Highly Available, High Performance Databases in a SAN-Free Environment

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Parallels Cloud Storage

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun Storage Tech. Lab SK Telecom

Deploying Silver Peak VXOA with EMC Isilon SyncIQ. February

File System Design and Implementation

RAID Storage, Network File Systems, and DropBox

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Berkeley Ninja Architecture

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Data Compression and Deduplication. LOC Cisco Systems, Inc. All rights reserved.

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

High Availability Solutions with MySQL

Performance Analysis of RAIDs in Storage Area Network

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

EMC Backup and Recovery for Oracle Database 11g Without Hot Backup Mode using DNFS and Automatic Storage Management on Fibre Channel

Benchmarking FreeBSD. Ivan Voras

High Performance Computing OpenStack Options. September 22, 2015

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

Diagram 1: Islands of storage across a digital broadcast workflow

Violin: A Framework for Extensible Block-level Storage

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

HIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010

FICON Extended Distance Solution (FEDS)

SILVER PEAK ACCELERATION WITH EMC VSPEX PRIVATE CLOUD WITH RECOVERPOINT FOR VMWARE VSPHERE

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

SCALABILITY AND AVAILABILITY

Solid State Storage in the Evolution of the Data Center

GlusterFS Distributed Replicated Parallel File System

Batch Processing in Disaster Recovery Configurations Best Practices for Oracle Data Guard

Cisco WAAS for Isilon IQ

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

A Method of Deduplication for Data Remote Backup

ebay Storage, From Good to Great

Transcription:

World Telecom Congress 2012 Workshop on Cloud Computing in the Telecom Environment, Bridging the Gap A Filesystem Layer Data Replication Method for Cloud Computing Masanori Itoh, Kei-ichi Yuyama, Kenjirou Yamanaka March 4, 2012 NTT DATA CORPORATION

Agenda 01 Background, Motivation and Goal 02 Overall Considerations and Related Works 03 Problem Analysis and Basic Ideas 04 Design and Implementation 05 Evaluation 06 Future Works 07 Summary Copyright 2012 NTT DATA CORPORATION 1

1 Background, Motivation and Goal Copyright 2012 NTT DATA CORPORATION 2

1. Background Social Background Eastern Japan Disaster ( 東日本大震災 ) on March 11, 2011 Strong Needs for Disaster Recovery Un-predictable Computing / Networking Resource Demand e.g., Systems for Checking People s Safety Copyright 2012 NTT DATA CORPORATION 3

1. Background Technical Background Demand for Inter-Cloud Federation Technology Aggregating Resources of Multiple Cloud Systems Inter-Cloud Scale-Out Inter-Cloud Disaster Recovery Inter-Cloud Scale-Out Service A Inter-Cloud Disaster Recovery Cloud C Disaster!!! Service 2 Service Y Service 1 Service 1 Service Y Service X Service 2 Cloud A Cloud B Cloud A Service X Cloud B 4

1. Motivation Motivation A National R&D Project Shooting for Inter-Cloud Scale-Out and Disaster Recovery in terms of Resource Control The Project Achieved Enabling Computing / Networking Resource Federation among Heterogeneous Multiple Cloud Systems Standardization Effort : GICTF (http://gitctf.jp) But, we needed to address the Tenant Data Replication Issue in a Suitable Way for Inter-Cloud Computing Environment. Copyright 2012 NTT DATA CORPORATION 5

Goal 1. Goal An Efficient Mechanism Enabling Tenant Data Replication (e.g., Database, various Log Files, etc.) with Reasonable Trade Offs under Inter-Cloud Computing Environment Need to Keep Replica(s) of Data as Up-to-Date as Possible Immediate, Synchronous, Replication Mechanism Copyright 2012 NTT DATA CORPORATION 6

1. Goal Requirements 1. Performance 1. Better Than Existing Solutions 2. Sufficient Replication Throughput even for Geologically Distributed Environment (e.g., Tokyo - Osaka) 2. Minimum Impact to Wide Variety of (New/Existing) Systems 1. Minimum Software (esp. Application) Modifications 2. Reasonable Operation Impact 3. Cost Efficiency 1. No Expensive Special Hardware Copyright 2012 NTT DATA CORPORATION 7

2 Overall Considerations and Related Works Copyright 2012 NTT DATA CORPORATION 8

2. Possible Layers Possible Layers (Existing Solutions) Application Layer User Application Dependent Implementations Middleware Layer MySQL Cluster, PostgreSQL Streaming Replication, etc. Block Device Layer drbd Hardware Layer EMC SRDF, etc. Copyright 2012 NTT DATA CORPORATION 9

2. Related Works An Overview Overview of Possible Layers User Space Kernel Space Application Middleware Filesystem Example 1: Application Layer Implemented by Application Developers Example 2: Middleware Layer MySQL Cluster, PostgreSQL Streaming Replication, etc. Need to Trap Data Issued by Applications Somewhere and Transfer to Geologically Distant Place(s) Software Hardware Block Device Driver I/O Fabric Storage Device Example 3: Block Device Layer drbd (distributed redundant block device) Example 4: Hardware Layer EMC SRDF, etc. Copyright 2012 NTT DATA CORPORATION 10

2. Related Works Pros/Cons Analysis Example 1/2 : Application/Middleware Layer : (e.g., MySQL Cluster) Pros Cons Easy to Keep Consistency, Reasonable Performance Application/Middleware Dependent Example 3 : Block Device Layer : (e.g., drbd) Pros Application/Middleware/File System Neutral <- Good! Cons Poor Performance, Small Room to Optimize Example 4 : Hardware Layer Pros Cons Software Neutral Hardware Dependent, (Very Much) Expensive, Poor Performance, Very Small Room to Optimize Copyright 2010 NTT DATA CORPORATION 11

2. Related Works Yet Another Layer Overview of Possible Layers User Space Kernel Space Software Hardware Application Middleware Filesystem Block Device Driver I/O Fabric Storage Device Example 1: Application Layer Implemented by Application Devlopers Example 2: Middleware Layer MySQL Cluster, PostgreSQL Streaming Replication, etc. Example 5 : Filesystem Layer zfs send/recv? Here is a room to do something Example 3: Block Device Layer drbd (distributed redundant block device) Example 4: Hardware Layer EMC SRDF, etc. Copyright 2012 NTT DATA CORPORATION 12

2. Related Works Pros/Cons Analysis Example 1/2 : Application/Middleware Layer : (e.g., MySQL Cluster) Pros Cons Easy to Keep Consistency, Reasonable Performance Application/Middleware Dependent Example 3 : Block Device Layer : drbd Pros Application/Middleware/File System Neutral <- Good! Cons Poor Performance, Small Room to Optimize Example 4 : Hardware Layer Pros Cons Software Neutral Hardware Dependent, (Very Much) Expensive, Poor Performance, Very Small Room to Optimize Example 5 : File System Layer Pros Cons Application/Middleware Neutral, Large Room to Optimize Needs Kernel Level Programming Copyright 2010 NTT DATA CORPORATION 13

3 Problem Analysis and Basic Ideas Copyright 2012 NTT DATA CORPORATION 14

3. Problem Statement Poor Performance The Lower Layer a Replication Mechanism is Implemented, the More Sensitively its Throughput is Affected Under Geologically Distributed Environment (LFP). Find Out the Best Place / Way to do Replication Work in terms of Performance. Not Sufficient Tenant Data Replication Performance against Network Line Investment Copyright 2010 NTT DATA CORPORATION 15

3. Problem Analysis and Basic Ideas drbd Replication Transmits each (Random) Write I/O Request to the Remote Site Inherently Uses Short Packets Poor Throughput Secure Replication is Provided by only Protocol C, which waits for I/O Completions at the Remote Site Idea Affects the Source Side I/O Requests Latency Make Use of Filesystem Journal Naturally Converts Random (Write) I/Os into Sequential I/Os Aggregates Multiple (Random) I/O Payloads Good Place to Implement Tenant Data Replication Copyright 2012 NTT DATA CORPORATION 16

4 Design and Implementation Copyright 2012 NTT DATA CORPORATION 17

Design -- Overall Architecture Overall Architecture Primary Site (e.g., Tokyo) User Space Kernel Space Middleware Memory Management Application Middleware Setup Utility I/O Requests (system calls) Journal Based FS Journal Trapper Subsystem Wide Area Network (Internet/ Dedicated ) Accumulate Journal Data Journal Data Receiver Backup Site (e.g., Osaka) Recovery Tool Apply Journal Data on Recovery Block I/O Scheduler block Device driver Journal Data Snapshot I/O reqs. Payload Journal Data Storage Device Copyright 2012 NTT DATA CORPORATION 18

4. Design and Implementation Principles of Operation : Source Side 1. Take a Snapshot of the Source File System (Partition Image (e.g., sdb1)) and Transfer it to the Remote Site 2. Mount the Source File System Establish (a) Connection(s) with the Remote Site 3. Begin Journal Data Transfer (Both Meta Data and Filesystem Payload ) Principles of Operation : Receiver Side 1. Receive Journal Data and Store them Locally/Sequentially Principles of Operation : On Recovery 1. Apply the Journal Data to the Snapshot Copyright 2012 NTT DATA CORPORATION 19

4. Design and Implementation Prototype Implementation Base Platform Fedora 14 (x86_64) + Fedora15 kernel (linux-2.6.37-2.fc15) Base Filesystem ext4 + jbd2 Souce Lines Trapper (Modified jbd2 driver) 4Ks Setup Utility (user land) : 1Ks Receiver (user land) : 4Ks In Total, 10Ksteps (Including bunch of debug codes) Recovery Tool (user land) : 1Ks c.f. drbd source lines: kernel 30Ks + user land 30Ks Copyright 2012 NTT DATA CORPORATION 20

4. Design and Implementation Optimizations in Prototype Implementation 1. Use Multiple TCP Connections per Mount Avoid Modification to TCP/IP Protocol Stack 2. Overlapping Local Journal I/O and Transmission over TCP connections Make Use of Parallelism and Issue Transmissions Frequently 3. ext4 Mount Options with respect to Journaling data=ordered(default), data= journal, data=writeback Created a Combined Mode of data=ordered and data=journal, and on the Source side: Write metadata only Transfer both metadata and data to the receiver side. Copyright 2012 NTT DATA CORPORATION 21

5 Evaluation Copyright 2012 NTT DATA CORPORATION 22

5. Evaluation Features Test Content of Files and Meta-data of them are Restored Correctly Performance Measurement Hardware/Software Xeon L5520 2P4C, 32GB, 146G SAS HDD (RAID 1) x 2, GbE NIC 2Gbps FC RAID, 146GB Volume (RAID10) Fedora14 (x86_64) + Modified Fedora 15 kernel (2.6.37-2.fc15) Network Delay Generator Linux netem (i.e., tc command) Benchmark bonnie++ : 1.96 pgbench (Postgresql 9.1.0), scaling factor= 256, clients=64 Copyright 2012 NTT DATA CORPORATION 23

5. Evaluation Emulated Geologically Distributed Environment One-Way Latency : 10ms via netem ~ Tokyo -Osaka I/O Pattern (Benchmark) bonnie++, pgbench Source Node Delay (10ms) Router Journal Data -> Receiver Node Receiver Side Behavior Receiver Sends Back ACKs after I/O Completion (Equivalent to drbd protocol C) Source Journal Data-> Delay (10ms) Journal Data Snapshot Copyright 2012 NTT DATA CORPORATION 24

5. Evaluation Results : bonnie++ Performance Impact 10 Times Faster than Compared to DRBD Protocol C without Overlap, 1 connection without Overlap,10 connections without Overlap, 500 connections with Overlap, 500 connections Sequential Write (block) 0.19 MB/s 1.77MB/s 26MB/s 33MB/s DRBD(Protocol C) 3.3 MB/s 10 Times Better! Copyright 2012 NTT DATA CORPORATION 25

5. Evaluation Results : bonnie++ Performance Impact 10 Times Faster than Compared to DRBD Protocol C without Overlap, 1 connection without Overlap, 10 connections without Overlap, 500 connections with Overlap, 500 connections Sequential Write (block) 0.19 MB/s 1.77MB/s 26MB/s 33MB/s DRBD(Protocol C) 3.3 MB/s Upper Limit (GbE = 125MB/s) No Replication (Base) 158 MB/s Need to Investigate Copyright 2012 NTT DATA CORPORATION 26

6 Future Works Copyright 2012 NTT DATA CORPORATION 27

6. Future Works More Detailed Analysis (Especially, Performance) Packet Level Analysis, etc. Further Evaluation Try Other Application Level Benchmarks Further Optimization Optimizing Journal Data Transmission Timing Use SSD on the Receiver Side Multiple-Tier Replication Data Chaining Use Secure Communication Channel (SSL?) Integration with the Inter-Cloud Federation Manager Other File Systems (e.g., jfs2, zfs?) Copyright 2012 NTT DATA CORPORATION 28

7 Summary Copyright 2012 NTT DATA CORPORATION 29

7. Summary Proposed Technique A Journal Based File System Layer Tenant Data Replication Method Features Application/Middleware Transparent Suitable for Inter-Cloud Computing Environment High Performance 10 Times better than drbd Lots of Room for Further Optimization Generically Applicable to Any Journal Based File Systems Minimum Implementation Impact Copyright 2012 NTT DATA CORPORATION 30

Acknowledgement This work is funded by the Ministry of Internal Affairs and Communications ( 総務省 ), Japan. 平成 22 年度情報通信技術の研究開発 (FY2010 R&D of Information Communication Technologies) http://www.soumu.go.jp/menu_news/s-news/02tsushin03_000024.html III クラウドサービスを支える高信頼省電力ネットワーク制御技術の研究開発 (R&D of Highly Available and Green Network Control Technology for Cloud Services) III-1 高信頼クラウドサービス制御基盤技術 (Technologies for Highly Available Cloud Service Control Foundation ) Copyright 2012 NTT DATA CORPORATION 31

Thank You! Copyright 2012 NTT DATA CORPORATION 32

Copyright 2012 NTT DATA CORPORATION 33

Q&A Copyright 2012 NTT DATA CORPORATION 34