Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware



Similar documents
Performance tuning policies for application level fault tolerance in distributed object systems

SCALABILITY AND AVAILABILITY

Huang-Ming Huang and Christopher Gill. Bala Natarajan and Aniruddha Gokhale

LinuxWorld Conference & Expo Server Farms and XML Web Services

Fault Tolerance in the Internet: Servers and Routers

Distributed Systems LEEC (2005/06 2º Sem.)

- An Essential Building Block for Stable and Reliable Compute Clusters

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Evaluating the Performance of Middleware Load Balancing Strategies

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Microsoft SQL Server on Stratus ftserver Systems

LBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services

How To Understand The Concept Of A Distributed System

Red Hat Enterprise linux 5 Continuous Availability

System Models for Distributed and Cloud Computing

Managing Mobile Devices Over Cellular Data Networks

Client/Server and Distributed Computing

Virtual machine interface. Operating system. Physical machine interface

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Replication on Virtual Machines

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Evaluation Methodology of Converged Cloud Environments

Introduction to CORBA. 1. Introduction 2. Distributed Systems: Notions 3. Middleware 4. CORBA Architecture

Principles and characteristics of distributed systems and environments

Copyright 1

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

MOSIX: High performance Linux farm

HRG Assessment: Stratus everrun Enterprise

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Performance Management for Cloudbased STC 2012

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Distribution One Server Requirements

Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform

Distributed System Principles

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

System types. Distributed systems

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

High Performance Cluster Support for NLB on Window

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Distributed Software Development with Perforce Perforce Consulting Guide

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Segmentation in a Distributed Real-Time Main-Memory Database

The Methodology Behind the Dell SQL Server Advisor Tool

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

CA ARCserve Family r15

Performance Management for Cloud-based Applications STC 2012

Dynamic Load Balancing in a Network of Workstations

Cloud Based Application Architectures using Smart Computing

Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors

CHAPTER 7 SUMMARY AND CONCLUSION

Chapter 18: Database System Architectures. Centralized Systems

Eloquence Training What s new in Eloquence B.08.00

Application Performance Testing Basics

SAN Conceptual and Design Basics

Maximum Availability Architecture

Tuning Tableau Server for High Performance

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

Performance Tuning and Optimizing SQL Databases 2016

ARC VIEW. Stratus High-Availability Solutions Designed to Virtually Eliminate Downtime. Keywords. Summary. By Craig Resnick

IBM WebSphere Distributed Caching Products

Why Standardize on Oracle Database 11g Next Generation Database Management. Thomas Kyte

White Paper on NETWORK VIRTUALIZATION

The International Journal Of Science & Technoledge (ISSN X)

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

Recommendations for Performance Benchmarking

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

Configuring Apache Derby for Performance and Durability Olav Sandstå

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Cloud Computing Disaster Recovery (DR)

Distributed Operating Systems

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

A SWOT ANALYSIS ON CISCO HIGH AVAILABILITY VIRTUALIZATION CLUSTERS DISASTER RECOVERY PLAN

Unification of Transactions and Replication in Three-Tier Architectures Based on CORBA

Transcription:

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware Priya Narasimhan T. Dumitraş, A. Paulos, S. Pertet, C. Reverte, J. Slember, D. Srivastava Carnegie Mellon University

Problem Description CORBA is increasingly used for applications, where dependability and quality of service are important The Real-Time CORBA (RT-CORBA) standard The Fault-Tolerant CORBA (FT-CORBA) standard But Neither of the two standards addresses its interaction with the other Either real-time support or fault-tolerant support, but not both Applications that need both RT and FT are left out in the cold Focus of MEAD Why real-time and fault tolerance do not make a good marriage Overcoming these issues to build support for CORBA applications that require both real-time and fault tolerance 2

Real-Time Systems Requires a priori knowledge of events Operations ordered to meet task deadlines RT-Determinism Bounded predictable temporal behavior Multithreading for concurrency and efficient task scheduling Use of timeouts and timer-based mechanisms Fault-Tolerant Systems No advance knowledge of when faults might occur Operations ordered to preserve data consistency (across replicas) FT-Determinism Coherent state across replicas for every input FT-Determinism prohibits the use of multithreading FT-Determinism prohibits the use of local processor time 3

Technology Development Trade-offs between RT and FT for specific scenarios Effective ordering of operations to meet both RT and FT requirements Proactive fault-tolerance to meet both RT and FT requirements Impact of fault-tolerance and real-time on each other Impact of faults/restarts on real-time behavior Replication of scheduling/resource management components Scheduling (and bounding) recovery to avoid missing deadlines Additional features Tools for (re-)configuring fault-tolerance in an resource-aware manner Tools for the structured injection of different kinds of faults Metrics (and measurement techniques) for objectively evaluating fault-tolerance 4

MEAD Architectural Overview Use replication to protect Application objects Scheduler and global resource manager Special RT-FT scheduler Real-time resource-aware scheduling service Fault-tolerant-aware to decide when to initiate recovery Hierarchical resource management framework Local resource managers feed into a replicated global resource manager Global resource manager coordinates with RT-FT scheduler Ordering of operations Keeps replicas consistent in state despite faults, missed deadlines, recovery and non-determinism in the system 5

Fault Model for MEAD Crash faults Hardware and/or OS crashes in isolation Process and/or Object crashes Communication faults Message loss and message corruption Network partitioning Resource-exhaustion faults Running out of memory, CPU, bandwidth Fault Model Kinds of faults that MEAD is designed to tolerate Omission faults Missed deadline in a real-time system Malicious faults (commission/byzantine) not in current version Processor/process/object maliciously subverted 6

7

MEAD Interceptors for Transparency Interceptor - User-level extension to add new capabilities; works with Unmodified operating systems Intercepted Middleware Process Unmodified ORBs and JVMs Unmodified applications MEAD employs interception to Plug in fault-tolerance at run-time Profile application for patterns of communication and resource usage Provide run-time support for different aspects of FT and RT Mix-and-match serial and parallel interceptors, at run-time, to achieve specific goals MEAD Interceptor Original Library Operating System Modified Functionality 8

Proactive Fault-Tolerance: Overview Carnegie Mellon Involves predicting, with some confidence, when a failure might occur, and compensating for the failure even before it occurs For instance, if we knew that a processor had an 80% chance of failing within the next 5 minutes, we could perform process-migration Our goal in MEAD is to Lower the impact faults have on real time schedules Implement proactive dependability in a transparent manner Proactive dependability has two aspects: Fault prediction: Reducing the unpredictable nature of faults Proactive recovery: Reducing fail-over times and number of failures experienced at the application-level (primary focus in MEAD) Complements, but does not replace, the classical reactive faulttolerance schemes since we cannot predict every fault 9

Experimental Results Carnegie Mellon Setup: Run on 5 PC850 Emulab nodes (100 Mbps LAN and UAV-RHL73-STD operating system) 4 server replicas,1 client Warm passive replication with memory leak fault Used round robin algorithm for failover Recovery schemes At Client Reactive recovery at client (with and without cached object references) At Interceptor On abrupt failure, client contacts group for next server reference Proactive failover (i.e. failover when resource usage < 100%) send either: GIOP LOCATION_FORWARD message Custom MEAD message that causes client to reconnect to next replica transparently at client 10

Summarized Results Recovery in Client App (No cache) Carnegie Mellon Proactive Recovery with MEAD message (Failover at 60%) RTT (in m icroseconds) 16000 14000 12000 10000 8000 6000 4000 2000 0 1 894 1787 2680 3573 4466 5359 6252 7145 8038 8931 9824 Runs RTT(in m icroseconds) 16000 14000 12000 10000 8000 6000 4000 2000 0 1 894 1787 2680 3573 4466 5359 6252 7145 8038 8931 9824 Runs Overhead in fault-free runs: GIOP LOCATION_FORWARD: 61%, MEAD message Overhead: 5% Approximate failover times: Reactive recovery at client app (no cache) 9400 µs (COMM_FAILUREs) Proactive recovery with GIOP message: 11000 µs Proactive recovery with MEAD message: 3400 µs Failures: None visible at the client side in proactive schemes Reactive schemes have 1:1 correspondence between client and server side 11

Benefits for Operational Scenarios Provides a framework for proactive recovery that is transparent to the client application Proactive recovery can: Significantly reduce failover times, lowering the impact of a failure on real-time schedules Reduce the number of failures experienced at the application level Provide advance warning of failures to other servers further down the line (multi-tiered applications) Request the recovery manager to launch new replicas so that a consistent number of replicas are retained in the group (useful for active replication where a certain number of servers are required to reach agreement) Caveat Not applicable to every kind of fault, of course 12

Versatile Dependability CPU usage Bandwidth Energy / Power Memory usage Number of nodes Storage space Strength of fault-model Group communication style FT granularity No. of faults tolerated Frequency of failures Window of vulnerability Overhead of FT Fault detection latency Replica launch latency Fault-recovery latency No. of missed deadlines Scheduling algorithms 13

Resource-Aware RT-FT Scheduling Carnegie Mellon Requires ability to predict and to control resource usage Example: Virtual memory is too unpredictable/unstable for real-time usage RT-FT applications that use virtual memory need better support Needs input from the local and global resource managers Resources of interest: load, memory, network bandwidth Parameters: resource limits, current resource usage, usage history profile Uses resource usage input for Proactive action Predict and perform new resource allocations Migrate resource-hogging objects to idle machines before they start executing Reactive action Respond to overload conditions and transients Migrate replicas of offending objects to idle machines even as they are executing invocations 14

Versatile Dependability: System Architecture Design goals One infrastructure, multiple knobs Quantifiable trade-offs: FT vs. RT. vs. resources (benchmarking) Dynamic adaptation to working conditions Transparency 15

Versatile Dependability: Overheads Fault-tolerance overhead varies Active replication has less overhead than warm passive replication Comparing the application with and without replication Overheads are currently in the range of 200% Most of the overhead comes from using a group communication system underneath we are looking at configuring this better Not yet optimized Resource usage (e.g. bandwidth) varies too Active Uniformly distributed across replicas Warm passive Not uniform Primary is worse than backups Experiments on reactive recovery 16

Resource Usage Experiments Server Replicas Client Server Replicas Client 17

Scalability Experiments (Clients/Replicas) Carnegie Mellon 18

Reactive Fault-Tolerance Experiments 19

Fault-Tolerance Advisor Middleware Application Application characteristics (size of state, invocation patterns, etc.) Reliability requirements Recovery time Faults to tolerate Run-time profile of resource usage Fault Tolerance Advisor RT-FT Schedule Number of replicas Locations of replicas Replication style Checkpointing rate Fault detection rate Operating system, Network speed/type, Configuration, Workstation speed/type 20

Fault-Tolerance Advisor Fault-tolerance configuration of reliable applications is often ad-hoc, done by hand with little or no optimization Reliability and performance can be greatly improved by tuning the fault-tolerance parameters Replication style, checkpointing rate, fault-detection rate, number of replicas, locations of replicas Static fault-tolerance configurations quickly go out of tune as systems change Dynamic re-tuning can adapt to changing system characteristics Varying fault rates, system load, resource availability, reliability requirements The Fault-Tolerance Advisor gives deployment-time and run-time advice to tune fault-tolerant applications running over MEAD 21

Fault-Tolerance Advisor Architecture MEAD Manager components enforce FT configuration Factory Fault detector Resource monitor / interceptor library Fault-tolerance advisor interface Fault-Tolerance Advisor Dynamically tunes fault-tolerance configuration Decentralized architecture No single point-of-failure Managers are symmetrically replicated on all nodes Synchronized view of system state via Spread group communication bus Managers take local action without requiring coordination Faster recovery, better scalability 22

Fault-Tolerance Advisor Architecture 23

Resource Monitoring Results Tested the scalability of resource monitoring software Measured round-trip time of batch image transfers Observed good scaling of Advisor s monitoring overhead Linear as advising-update frequency increases Linear as number of nodes increases Linear as network traffic increases 24

Fault-Tolerance Advising Results Carnegie Mellon Tested the use of interceptor libraries for network monitoring Monitor library intercepted network system calls Monitored incoming and outgoing network traffic for each process Measured round-trip time of batch image transfers Low performance impact Minimal increase of average round-trip time (~4%) Minimal increase in jitter 25

Offline Program Analysis Application may contain RT vs. FT conflicts Application may be non-deterministic Multi-threading Direct access to I/O devices Local timers Program analyzer sifts interactively through application code To pinpoint sources of conflict between real-time and fault-tolerance To determine size of state, and to estimate recovery time To determine the appropriate points in the application for the incremental checkpointing of the application To highlight, and to compensate for, sources of non-determinism Offline program analyzer feeds its recovery-time estimates to the Fault-Tolerance Advisor 26

Summary Resolving trade-offs between real-time and fault tolerance Bounding fault detection and recovery times in asynchronous environment Estimating worst-case performance in fault-free, faulty and recovery cases MEAD s RT-FT middleware support Tolerance to crash, communication and timing faults Proactive dependability framework Fault-tolerance advisor to take the guesswork out of configuring reliability Release of MEAD on Emulab Active and warm passive replication for CORBA Uses the Spread group communication system for underlying transport Ongoing work with fault-tolerant CCM with active replication 27

For More Information on MEAD http://www.ece.cmu.edu/~mead Priya Narasimhan Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA 15213-3890 Tel: +1-412-268-8801 priya@cs.cmu.edu 28