Replication on Virtual Machines



Similar documents
A Fault-Tolerant Java Virtual Machine

State-Machine Replication

CSCI E 98: Managed Environments for the Execution of Programs

Transparent Fault-Tolerant Java Virtual Machine

Cloud Computing. Up until now

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Chapter 3 Operating-System Structures

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Fault Tolerance in the Internet: Servers and Routers

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

enterprise professional expertise distilled

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

Performance Tools for Parallel Java Environments

Multi-core Programming System Overview

System Structures. Services Interface Structure

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Monitoring and Managing a JVM

Validating Java for Safety-Critical Applications

Port of the Java Virtual Machine Kaffe to DROPS by using L4Env

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

What s Cool in the SAP JVM (CON3243)

IBM SDK, Java Technology Edition Version 1. IBM JVM messages IBM

Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Eloquence Training What s new in Eloquence B.08.00

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Chapter 3: Operating-System Structures. Common System Components

Tool - 1: Health Center

HPC performance applications on Virtual Clusters

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Chapter 2: OS Overview

General Introduction

ELEC 377. Operating Systems. Week 1 Class 3

Why Threads Are A Bad Idea (for most purposes)

Java Performance Evaluation through Rigorous Replay Compilation

Chapter 6, The Operating System Machine Level

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Java Real-Time Distributed Processing over Chorus/OS

OPERATING SYSTEMS SCHEDULING

Chapter 14: Recovery System

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

An Implementation Of Multiprocessor Linux

Virtual Machine Learning: Thinking Like a Computer Architect

Java Troubleshooting and Performance

Optimising Cloud Computing with SBSE

Chapter 2 System Structures

Effective Java Programming. efficient software development

Characterizing Java Virtual Machine for More Efficient Processor Power Management. Abstract

picojava TM : A Hardware Implementation of the Java Virtual Machine

Processes and Non-Preemptive Scheduling. Otto J. Anshus

From Control Loops to Software

Garbage Collection in the Java HotSpot Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Built-in Concurrency Primitives in Java Programming Language. by Yourii Martiak and Mahir Atmis

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

G Porcupine. Robert Grimm New York University

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Operating System Engineering: Fall 2005

The Microsoft Windows Hypervisor High Level Architecture

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Configuring Apache Derby for Performance and Durability Olav Sandstå

The Use of Traces for Inlining in Java Programs

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

Data Management in the Cloud

Administering Microsoft SQL Server 2012 Databases

Introduction of Virtualization Technology to Multi-Process Model Checking

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

Multi-Threading Performance on Commodity Multi-Core Processors

Operating Systems 4 th Class

Linux Process Scheduling Policy

Precise and Efficient Garbage Collection in VMKit with MMTk

SMock A Test Platform for the Evaluation of Monitoring Tools

Python Programming: An Introduction to Computer Science

VIRTUALIZATION AND CPU WAIT TIMES IN A LINUX GUEST ENVIRONMENT

Using jvmstat and visualgc to Solve Memory Management Problems

INTRODUCTION TO JAVA PROGRAMMING LANGUAGE

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Introduction. What is an Operating System?

Eclipse Visualization and Performance Monitoring

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Practical Performance Understanding the Performance of Your Application

Tasks Schedule Analysis in RTAI/Linux-GPL

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

PC Based Escape Analysis in the Java Virtual Machine

Web Performance, Inc. Testing Services Sample Performance Analysis

In Memory Accelerator for MongoDB

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston,

WEBLOGIC ADMINISTRATION

The Advantages of Class Loaders Over Compaction in Java

Principles and characteristics of distributed systems and environments

Transcription:

Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004

Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism Implementation Experiments 3 Fiedman, Kama - SRDS 2003 Introduction Non-determinism Design and implementation Experimentation

Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism Implementation Experiments 3 Fiedman, Kama - SRDS 2003 Introduction Non-determinism Design and implementation Experimentation

JVM philosophy Compile once, run everywhere Java bytecodes bytecode = instruction set of Java Virtual Machine One JVM for each architecture High-level support Memory management (garbage collection) Multithreading support (monitors)

JVM to real machines Internal components Dynamic class loader Interpreters vs. Just in Time compiling (JIT) Native methods (JNI) Provided libraries Allocation and garbage collection User-level vs. native threads

Random technical details A few characteristics Compact bytecodes (202 instructions) Types are preserved for safety, precise GC Objects accessible through references Strong, soft, weak, phantom references Object can be shared Passed to new thread constructor Static fields

Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism Implementation Experiments 3 Fiedman, Kama - SRDS 2003 Introduction Non-determinism Design and implementation Experimentation

General idea Their work Modify JVM to tolerate fail-stop failures. Extends hypervisor-based fault-tolerance Hypervisor model Implement a virtual state machine over underlying hardware Perform replica coordination in the hypervisor

State machine approach Requirements 1 Determinism: defining replicas 2 Independence: implementing replicas 3 Choice replication: ensuring replication 4 Transparency: guaranteeing single output A state machine Read set [ deterministic command ] Write set output to environment

A state machine for the JVM Challenges Non-determinism of commands Replication of sequence of commands Copying read-sets Multithreading Their approach: Bytecode execution engines (BEE) A BEE is a state machine JVM = set of BEEs, one for each application thread Replication at BEE level

Sources of non-determinism Some causes of non-determinism Asynchronous commands Non-deterministic commands Non-deterministic read sets Output to environment

Asynchronous commands Definition A command is asynchronous if it can appear anywhere in the BEE s sequence of commands. Examples Hardware interrupts, not for JVM Asynchronous Java exceptions Fatal errors, e.g. no resources, deadlocks Killing another thread, i.e. thread.stop() Added restrictions 1 Fatal exceptions are not replicated 2 Threads must not call Thread.stop

Non-deterministic commands Definition A command is non-deterministic if it write-set or output are not uniquely determined by the read-set values. Example Native calls: I/O, clock Solution Agreement between replicas on input environment and read-set. Not possible! input is outside JVM s control Backup must adopt primary write-set Restrict output on native methods

Non-deterministic commands Added restrictions 3 Native methods produce deterministic output to environment 4 Native methods invoke other methods deterministically Handling these conditions: splitting methods void non determ read write() { long r = read clock(); printf("%d\n",r); }

Non-deterministic commands Added restrictions 3 Native methods produce deterministic output to environment 4 Native methods invoke other methods deterministically Handling these conditions: splitting methods long non determ read(){ return read clock(); } void determ write(long r ){ printf("%d\n",r); }

Non-deterministic read sets Definition A read set is non-deterministic if it contains a shared variable. Examples Invoking methods on shared objects Storing objects in static references Alternatives Bookkeeping of shared data: an order of magnitude overhead Lock acquisition ordering, needs data-race elimination Exclusive access to all variables while thread is scheduled

Non-deterministic read sets Added restrictions 5 Include one of the following Example 1 No data-races (protect any shared data with monitors) 2 Exclusive access to all shared variables class Example { static F shared = null; String tostring() { if (null == shared) { shared = new F(); synchronized call();...

Output to the environment Definition An output is idempotent if it is independent of the number of times a command is executed. An output is testable when the environment can be tested for occurrence of output. Examples cd /home/siggi is idempotent, cd.. is not cd.. is testable if pwd is available. Definition A state is volatile if it does not survive failure of state machine. Otherwise, it is stable.

Output to the environment Solution Only support idempotent and testable output. Volatile data might be necessary for correct operation Side effects handlers: replicate lost volatile state of the primary Added restrictions 6 Output of native methods is idempotent or testable 7 Native methods annotated for volatile output

Implementation details Extended JVM New threads for Failure detection and backup initiation Transfer of logging information Interact with other system threads: GC, finalization Threading modification Restriction (5.2) requires modifying multi-threading libraries Sun s JVM provides both native and green threads Native threads are desired to run applications on SMP Green threads are desired for portability

Implementation for non-deterministic commands Initial work Inspected and categorized all native methods by hand! Found only 100 non-deterministic Runtime support algorithm Create table with (non-deterministic) method s unique signature Native call on primary triggers message to backup Backup on recovery uses same values Side effect handlers used for volatile state (e.g. file descriptor from an open command)

Non-deterministic read sets: first approach Replicated lock synchronization Assumes (5.1), ensuring mutual exclusion Defines lock acquisition record = (t i,l j,t # i,l # j ) Locking thread (t i ) Lock (l j ) Relative order of lock acquisition thread acquire sequence number (t # i ) lock acquire sequence number (l # j ) Primary creates (t i,l j,t # i,l # j ) Backup uses (t i,l j,t # i,l # j ) to repeat ordering

Computing lock acquisition record Defining record values Not trivial Object address for l j : meaningless at replicas Order of events: might differ in primary and backup Recursive definition for t i = (t p,k) t p is parent of t i t i is the k created w.r.t. siblings Use thread determinism for l j l j assigned the first time used Log map l j (t i,t # i )

Using lock acquisition record Recovery algorithm Case 1: Backup thread t i tries to acquire l j, and Log contains r = (t i,l j,t # i,l # j ) Wait until we reach l # j Remove r from log Case 2: Backup thread t i tries to acquire l j, and Log doesn t contain (t i,l j,t # i,l # j ) Wait until log is empty (end of recovery protocol)

Using lock acquisition record Recovery algorithm Case 3: Backup thread t i tries to acquire a lock with no id, and Log contains map l j (t i,t # i ) Assign lock primary s l j Remove map entry Case 4: Backup thread t i tries to acquire a lock with no id, and Log doesn t contain map l j (t i,t # i ) Wait until A thread t i assigns l j to the lock Log contains no more maps (assign fresh l j )

Non-deterministic read sets: second approach Replicated thread scheduling Assumes (5.2), all shared data is protected Defines thread scheduling record = (bn,pc,m,l #,t n ) Code executed Current program counter (pc) Trace summary to get there (bn) Monitor uses (m) Thread was waiting on a lock (l # ) Next scheduled thread (t n ) Log record on each context switch

Computing thread scheduling record Defining program position How many statements were executed? Avoid counting each instruction bn counts branches, jumps and invocations taken. pc is program counter offset (not absolute address): updated on every instruction!

Computing thread scheduling record Defining program position What if preemption occurred inside a native method? Can t control preemption outside JVM On recovery, preemption before native call? Need to keep track of locks acquired Locking done in JVM monitors On recovery, preempt when m is reached

Interaction with system threads Example Heap shared with GC. GC not in Java! Problems t i acquires a lock at primary with no contention, but t i waits at backup t n can enter at backup before it should! User-level threads to force t i to stay t i acquires a lock at backup with no contention, but t i waits at primary Use m also to force rescheduling at backup

Replicated scheduling: final details Wait and notifyall() Multiple threads awakened Store the l # to preserve order in backup Finishing recovery Log becomes empty, last entry contains t n Backup must schedule t n to reproduce interaction to environment.

Garbage Collection Common problems Soft/weak references Primary and backup may diverge Convert them to strong references Finalizers should be no source of non-determinism Replicate as before

Output to the environment Side effects handlers Store and recover volatile state Ensure exactly-once semantics for output Composed by 5 methods register test log receive restore

Components of SE handlers Method register Provide method signature Non-determinism flag Output command flag Arguments used for output Method test Used by backup True if output command was successfully executed Only defined for testable commands Idempotent commands are replayed

Components of SE handlers Method log Used by primary after an output command Saves arguments, return value and internal state Produces a message with recovery information Method receive Used by backup to retrieve result of log Can perform compaction of messages Method restore Used by backup only once to recover volatile state Uses received messages

Experimental setting Architecture and settings Sun E5000 Servers. 15 400MHz UltraSPARC II CPUs. 2GB Mem. 100Mbps Ethernet Primary and backup run on different machines Log is kept at backup in volatile memory Synchronization on each output (acks) Interpreted mode, no JIT 3 scenarios: AL, TS, NoFT Only green threads (native on SMP yield similar result)

Experimental setting Benchmarks Spec JVM98 Benchmarks Shown result for 6 benchmarks compress: cpu intensive db: database, heavy on locking mtrt: only multithreaded

Algorithms comparison Running times under two algorithms

Overhead Overhead of lock acquisition algorithm

Overhead Overhead of thread scheduling algorithm

Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism Implementation Experiments 3 Fiedman, Kama - SRDS 2003 Introduction Non-determinism Design and implementation Experimentation

Introduction Another hypervisor Build on top of Jikes RVM Ignore native code Support JIT Jikes RVM Almost all in Java Yield points and time slices

Sources of non-determinism Multithreading Use deterministic scheduler (yield points) Deterministic dequeuing Data-races on SMPs assume no data-races, enforce lock ordering

Design decisions Frames One frame lag between primary and backup Synchronize with replicas before starting a new frame Send all I/O results to replicas at start point Send locks (on SMP) anywhere Send non-deterministic read sets

Frames example Framing...

Implementation details Replication engine Additional module to Jikes RVM Communication between primary and backup Detection of fails Election of new primary

Implementation details Hurdles JIT compilation saving thread switch counter Non-deterministic number of statements also disable preemption Garbage collection on SMP: cooperative threads GC non-preemptive until all are done

Experimental setting Benchmarks Some Spec JVM98 Benchmarks and SciMark scimark, compress, db, raytrace, mtrt Variations on frame-size (number of context switches)

Compress compress Benchmark

Database db Benchmark

Raytrace raytrace Benchmark

Multithreaded Raytrace mtrt Benchmark

Replication Overhead Overhead

Final remarks Summary Common technique: hypervisor model Restrictions to solve non-determinism Support for SMPs First paper main features SE Handlers: native methods Second paper main features Frames Lower synchronization Faster recovery