Integrity of In-memory Data Mirroring in Distributed Systems Tejas Wanjari EMC Data Domain



Similar documents
1000Mbps Ethernet Performance Test Report

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

D1.2 Network Load Balancing

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Intrusion Detection System

Linux NIC and iscsi Performance over 40GbE

Gigabit Ethernet Design

Design of an Application Programming Interface for IP Network Monitoring

The functionality and advantages of a high-availability file server system

Performance of Software Switching

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

ADVANCED NETWORK CONFIGURATION GUIDE

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

Programmable Networking with Open vswitch

Performance of Host Identity Protocol on Nokia Internet Tablet

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

short introduction to linux high availability description of problem and solution possibilities linux tools

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

Frequently Asked Questions

VMWARE WHITE PAPER 1

Networking Virtualization Using FPGAs

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Datacenter Operating Systems

SIDN Server Measurements

AIX NFS Client Performance Improvements for Databases on NAS

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Performance Evaluation of Linux Bridge

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Long term retention and archiving the challenges and the solution

Tempesta FW. Alexander Krizhanovsky NatSys Lab.

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Eloquence Training What s new in Eloquence B.08.00

Integration Guide. EMC Data Domain and Silver Peak VXOA Integration Guide

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

EVALUATING THE NETWORKING PERFORMANCE OF LINUX-BASED HOME ROUTER PLATFORMS FOR MULTIMEDIA SERVICES. Ingo Kofler, Robert Kuschnig, Hermann Hellwagner

A Performance Analysis of Gateway-to-Gateway VPN on the Linux Platform

A Packet Forwarding Method for the ISCSI Virtualization Switch

Project 4: IP over DNS Due: 11:59 PM, Dec 14, 2015

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Availability Digest. Redundant Load Balancing for High Availability July 2013

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Performance Comparison of SCTP and TCP over Linux Platform

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

High-Density Network Flow Monitoring

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Quantifying TCP Performance for IPv6 in Linux- Based Server Operating Systems

Steelcape Product Overview and Functional Description

Accelerate In-Line Packet Processing Using Fast Queue

Overview of Network Measurement Tools

Network Simulation Traffic, Paths and Impairment

Virtual Private Systems for FreeBSD

EMC DATA DOMAIN OPERATING SYSTEM

RCL: Software Prototype

Software-defined Storage at the Speed of Flash

EMC DATA DOMAIN OPERATING SYSTEM

Benchmarking Virtual Switches in OPNFV draft-vsperf-bmwg-vswitch-opnfv-00. Maryam Tahhan Al Morton

PRODUCTS & TECHNOLOGY

The Case Against Jumbo Frames. Richard A Steenbergen <ras@gtt.net> GTT Communications, Inc.

Autonomous NetFlow Probe

SILVER PEAK ACCELERATION WITH EMC VSPEX PRIVATE CLOUD WITH RECOVERPOINT FOR VMWARE VSPHERE

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Optimizing Large Arrays with StoneFly Storage Concentrators

Network Probe. Figure 1.1 Cacti Utilization Graph

ETM System SIP Trunk Support Technical Discussion

High-performance vswitch of the user, by the user, for the user

EonStor DS remote replication feature guide

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Firewalls Overview and Best Practices. White Paper

How To Secure My Data

1000-Channel IP System Architecture for DSS

Linux Virtual Server Tutorial

The Impact of SCTP on SIP Server Scalability and Performance

Exchange 2010 ITPro. Milan Marenčík Microsoft Services

Chronicle: Capture and Analysis of NFS Workloads at Line Rate

Getting the most TCP/IP from your Embedded Processor


DOCUMENT REFERENCE: SQ EN. SAMKNOWS TEST METHODOLOGY Web-based Broadband Performance White Paper. July 2015

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Can High-Performance Interconnects Benefit Memcached and Hadoop?

ON THE IMPLEMENTATION OF ADAPTIVE FLOW MEASUREMENT IN THE SDN-ENABLED NETWORK: A PROTOTYPE

How valuable DDoS mitigation hardware is for Layer 7 Sophisticated attacks

A Dell Technical White Paper Dell Storage Engineering

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Cisco Integrated Services Routers Performance Overview

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TFTP TRIVIAL FILE TRANSFER PROTOCOL OVERVIEW OF TFTP, A VERY SIMPLE FILE TRANSFER PROTOCOL FOR SIMPLE AND CONSTRAINED DEVICES

Performance of Network Virtualization in Cloud Computing Infrastructures: The OpenStack Case.

Online Remote Data Backup for iscsi-based Storage Systems

Remus: : High Availability via Asynchronous Virtual Machine Replication

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

Transcription:

Integrity of In-memory Data Mirroring in Distributed Systems Tejas Wanjari EMC Data Domain

Problem Definition In-memory data is changing Disk checksums are for the older state Mirroring cannot rely on disk checksums Undetected corruptions are not acceptable Reliability is prime (e.g. Backup/Recovery Systems)

Sources of Corruption during Mirroring System failure Clean shutdown and reboot Hardware failure Redundant failover Disks failure Disk/filesystem checksums Process corruption Avoiding copying without checksums Network corruption Application/protocol checksums

TCP Checksum Vulnerability TCP Checksum: 4 bytes & weak Prone to False Positives (FPs) Wrong data, correct checksum Failure probability: 1 in 16 million to 10 billion packets for 1526 bytes [Reference: [1] Stone et. al., When the CRC and TCP checksum disagree] Implies 1 undetected TCP corruption in 20GB to 1.2TB data, approximately

Strong checksum in Application? Performance overhead Application data-structures different from network data-structures (e.g. B-tree data to fit into MTU) Interconnect or network is the vulnerability, not the application End up reinventing transport protocol in application (over TCP!) Handling retransmissions, in-order delivery, gaps, etc.

Ideal Solution Zero-copying: avoid multiple copies without checksums H/w redundancy for hardware failures Clean shutdowns on system failures Filesystem/block/disk checksums for disk reliability Bridging the integrity gap in network/interconnect Protection in transport protocol

Why reinvent the wheel? RFC 2385: TCP MD5 Signature Option Implemented in Linux Kernel as TCP_MD5SIG socket option Linux implementation: Efficient compute (uses kernel crypto-engine) Retransmission on checksum mismatch Implies seamless error-recovery Reduces syscalls by calling /dev/crypto from within kernel. Thus, lesser copy_to_user/copy_from_user and smaller memory footprint.

Working of TCP_MD5SIG socketopt Both client and server must know each others : IP Port MD5 Key before the connection is setup Client must bind() for the server to save the <IP,Port,MD5Key> mapping

TCP_MD5SIG over Socket

Evaluation: Latency 120 Round Trip Time (u-seconds) 100 80 60 40 20 0 1 Thread - 1 Socket - 1 CPU 9000 bytes MTU 10Gbps Ethernet 256 512 1024 2048 4096 6144 8192 TCP payload (bytes) no-md5 TCP_MD5SIG User MD5

Evaluation: Throughput (Single-threaded) Throughput (Gbps) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 Thread - 1 Socket - 1 CPU 9000 bytes MTU 10Gbps Ethernet 256 512 1024 2048 4096 6144 8192 TCP payload (bytes) TCP_MD5SIG User MD5

Evaluation: Throughput - TCP_MD5SIG (Multi-threaded) Throughput (Gbps) 12 10 8 6 4 2 0 9000 bytes MTU 10Gbps Ethernet 60 CPU cores 256 512 1024 2048 4096 6144 8192 TCP payload (bytes) 10 threads 20 threads 30 threads 40 threads 50 threads 60 threads

Evaluation: Memory Footprint (accessing /dev/crypto from userspace) Memory Footprint (bytes) 35000 30000 25000 20000 15000 10000 5000 0 copy_from_user / copy_to_user TCP_MD5SIG: 2 socket syscalls User MD5: 2 socket syscalls 256 512 1024 2048 4096 6144 8192 TCP payload (bytes) TCP_MD5SIG User MD5

Use-Case: NVM Mirroring Throughput (reqs/s) 200000 150000 100000 50000 0 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K TCP payload (bytes) TCP_MD5SIG (reqs/s) no-md5 (reqs/s) Throughput (Mbps) 2000 1500 1000 500 0 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K TCP payload (bytes) TCP_MD5SIG (Mbps) no-md5 (Mbps)

Conclusion Not a generic solution First try other fits: TCP checksum not good enough for the application? Disk/filesystem checksum Disk/flash mirroring But very effective for typical usecases For line-speed mirroring of in-memory data: Better throughput, memory footprint and same latency Error detection and recovery seamless to application Future prospects: Persistent Memory

References [1] Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum disagree. In Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '00). ACM, New York, NY, USA, 309-319. DOI=10.1145/347059.347561 http://doi.acm.org/10.1145/347059.347561 [2] TCP_MD5SIG: An Undocumented Socket Option in Linux. http://criticalindirection.com/2015/05/12/tcp_md5sig/ [3] Iperf3 with TCP_MD5SIG (Patch being submitted): https://github.com/tejaswanjari/iperf [4] Linux Kernel Source-tree www.kernel.org

Thank-you! Questions?