State-Machine Replication

Similar documents
Replication on Virtual Machines

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Virtualization. Jia Rao Assistant Professor in CS

Full and Para Virtualization

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

COS 318: Operating Systems. Virtual Machine Monitors

Virtual Machine Security

Virtualization Technology. Zhiming Shen

Chapter 14 Virtual Machines

12. Introduction to Virtual Machines

Virtual machines and operating systems

Remus: : High Availability via Asynchronous Virtual Machine Replication

Distributed Systems. Virtualization. Paul Krzyzanowski

IOS110. Virtualization 5/27/2014 1

Virtualization. Types of Interfaces

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

CPS221 Lecture: Operating System Structure; Virtual Machines

Virtualization. Pradipta De

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Introduction to Virtual Machines

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Eloquence Training What s new in Eloquence B.08.00

Virtual Machines. Virtualization

The Xen of Virtualization

Chapter 11: Input/Output Organisation. Lesson 06: Programmed IO

Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert

x86 ISA Modifications to support Virtual Machines

Virtualization and the U2 Databases

Security Overview of the Integrity Virtual Machines Architecture

Anh Quach, Matthew Rajman, Bienvenido Rodriguez, Brian Rodriguez, Michael Roefs, Ahmed Shaikh

Knut Omang Ifi/Oracle 19 Oct, 2015

Chapter 5 Cloud Resource Virtualization

Virtualization. Michael Tsai 2015/06/08

Cloud Computing #6 - Virtualization

Fault Tolerant Solutions Overview

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

COS 318: Operating Systems. Virtual Machine Monitors

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Models For Modeling and Measuring the Performance of a Xen Virtual Server

Virtual Hosting & Virtual Machines

HELSINKI METROPOLIA UNIVERSITY OF APPLIED SCIENCES. Information Technology. Multimedia Communications MASTER S THESIS

Chapter 3 Operating-System Structures

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Virtual Machines.

Multi-core Programming System Overview

System Structures. Services Interface Structure

I/O. Input/Output. Types of devices. Interface. Computer hardware

Virtual Machine Synchronization for High Availability Clusters

Embedded Systems. 6. Real-Time Operating Systems

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

The Microsoft Windows Hypervisor High Level Architecture

JavaPOS TM FAQ. What is an FAQ? What is JavaPOS?

Basics of Virtualisation

Cloud Computing CS

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Cloud Computing. Up until now

Operating System Structures

OS Concepts and structure

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

G-Cloud 6 brightsolid Secure Cloud Servers. Service Definition Document

Cloud^H^H^H^H^H Virtualization Technology. Andrew Jones May 2011

Enterprise Deployment: Laserfiche 8 in a Virtual Environment. White Paper

COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware

Virtualization. Dr. Yingwu Zhu

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Implementing and Managing Windows Server 2008 Hyper-V

Intro to Virtualization

A Deduplication File System & Course Review

EMC Celerra Unified Storage Platforms

Design of a High-Availability Multimedia Scheduling Service using Primary-Backup Replication

COS 318: Operating Systems

Have both hardware and software. Want to hide the details from the programmer (user).

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

The Art of Virtualization with Free Software

CHAPTER 15: Operating Systems: An Overview

Virtualization with Windows

Virtualization for Hard Real-Time Applications Partition where you can Virtualize where you have to

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

nanohub.org An Overview of Virtualization Techniques

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Virtualization. Explain how today s virtualization movement is actually a reinvention

6422: Implementing and Managing Windows Server 2008 Hyper-V (3 Days)

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

GPFS-OpenStack Integration. Dinesh Subhraveti IBM Research

The Google File System

Chapter 2 System Structures

Virtual Machines. Virtual Machines

Virtualization for Cloud Computing

Development of Type-2 Hypervisor for MIPS64 Based Systems

Transcription:

State-Machine Replication

The Problem Clients Server

The Problem Clients Server

The Problem Clients Server

The Problem Clients Server

The Problem Clients Server

The Problem Clients Server Solution: replicate server!

The Solution

The Solution 1. Make server deterministic (state machine)

The Solution 1. Make server deterministic (state machine) State machine

The Solution 1. Make server deterministic (state machine) 2. Replicate server State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions Clients State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions Commands Clients State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions Commands Clients State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines Voter

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines Voter

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines Voter

The Solution 1. Make server deterministic (state machine) 2. Replicate server 3. Ensure correct replicas step through the same sequence of state transitions 4. Vote on replica outputs for fault-tolerance Clients State machines Voter

A conundrum

A conundrum

A conundrum

A conundrum

A conundrum

A conundrum...

A conundrum... A: voter and client share fate!

A conundrum... A: voter and client share fate!

A conundrum... A: voter and client share fate!

A conundrum... A: voter and client share fate!

A conundrum... A: voter and client share fate!

A conundrum... A: voter and client share fate!

State Machines Set of state variables + Sequence of commands A command Reads its read set values (opt. environment) Writes to its write set values (opt. environment) A deterministic command Produces deterministic wsvs and outputs on given rsv A deterministic state machine Reads a fixed sequence of deterministic commands

Semantic Characterization of a State Machine Outputs of a state machine are completely determined by the sequence of commands it processes, independent of time and any other activity in a system

Replica Coordination All non-faulty state machines receive all commands in the same order Agreement: Every non-faulty state machine!! receives every command Order: Every non-faulty state machine processes the!commands it receives in the same order

Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application programmers

Hypervisor-based Fault-tolerance Implement RC at a virtual machine running on the same instruction-set as underlying hardware Undetectable by higher layers of software One of the great come-backs in systems research! CP-67 for IBM 369 [1970] Xen [SOSP 2003], VMware

The Hypervisor as a State Machine Two types of commands virtual-machine instructions virtual-machine interrupts (with DMA input) State transition must be deterministic...but some VM instructions are not (e.g. timeof-day) interrupts must be delivered at the same point in command sequence

The Architecture Good-ol Primary-Backup Ethernet Primary makes all nondeterministic choices I/O Accessibility Assumption Primary and backup have access to same I/O operations Primary HP 9000/720 Backup HP 9000/720 I/O Device

Ensuring identical command sequences Ordinary (deterministic) instructions Environment (nondeterministic) instructions

Ensuring identical command sequences Ordinary (deterministic) instructions Environment (nondeterministic) instructions Environment Instruction Assumption Hypervisor captures all environmental instructions, simulates them, and ensures they have the same effect at all state machines

Ensuring identical command sequences Ordinary (deterministic) instructions Environment (nondeterministic) instructions Environment Instruction Assumption VM interrupts must be delivered at same point in instruction sequence at all replicas

Ensuring identical command sequences Ordinary (deterministic) instructions Environment (nondeterministic) instructions Environment Instruction Assumption VM interrupts must be delivered at same point in instruction sequence at all replicas Instruction Stream Interrupt Assumption Hypervisor can be invoked at specific point in the instruction stream

Ensuring identical command sequences Ordinary (deterministic) instructions Environment (nondeterministic) instructions Environment Instruction Assumption VM interrupts must be delivered at same point in instruction sequence at all replicas Instruction Stream Interrupt Assumption implemented via recovery register interrupts at backup are ignored

The failure-free protocol P0: On processing environment instruction i at pc, HV of primary p : P3: If b HV processes environment! instruction i at pc : sends![e! p, pc,! Val i ] to backup b b waits for [e b, pc, Val i ] from p waits for ack returns Val i P1: If p HV receives Int from its VM: p buffers Int P2: If epoch ends at p : p sends to b all buffered Int in e p p p waits for ack delivers all VM Int in e p e p := e p +1 p starts e p If b receives [E, pc, Val] from p: b sends ack to p b buffers Val for delivery at E, pc P4: If b HV receives Int from its VM b ignores Int P5: If epoch ends at b : b waits from pfor interrupts for e b b sends ack to p b delivers all VM Intbuffered in e b e b := e b +1 b starts e b

If the primary fails P6: If b receives a failure notification instead of [e b, pc, Val i ], b executes i If in P5 of Int: b receives failure notification instead e b := e b +1 b b e b starts <--- failover epoch is promoted primary for epoch e b +1 If p crashes before sending Int to b, Int is lost!

SMR and the environment On outputs, no exactly-once guarantee on outputs On primary failure, avoid input inconsistencies time must increase monotonically at epoch, primary informs backup of value of its clock interrupts must be delivered as a fault-free processor would but interrupts can be lost... weaken constraints on I/O interrupts

On I/O device drivers IO1: If an I/O instruction is executed and the I/O operation performed, the processor issuing the instruction delivers a completion interrupt, unless it fails. Either way, the I/O device is unaffected. IO2: An I/O device may cause an uncertain interrupt (indicating the operation has been terminated) to be delivered by the processor issuing the I/O instruction. The instruction could have been in progress, completed, or not even started. On an uncertain interrupt, the device driver reissues the corresponding I/O instruction not all devices though are idempotent or testable

Backup promotion and uncertain interrupts P7: The backup s VM generates an uncertain interrupt for each I/O operation that is outstanding right before the backup is promoted primary (at the end of the failover epoch)

The Hypervisor prototype Supports only one VM to eliminate issues of address translation Exploits unused privileged levels in HP s PA-RISC architecture (HV runs at level 1) To prevent software to detect HV, hacks one assembly HP-UX boot instruction

RC in the Hypervisor Nondeterministic ordinary instructions (Surprise!)

RC in the Hypervisor Nondeterministic ordinary instructions (Surprise!) TLB replacement policy non-deterministic TLB misses handled by software Primary and backup may execute a different number of instructions! HV takes over TLB replacement

RC in the Hypervisor Nondeterministic ordinary instructions (Surprise!) TLB replacement policy non-deterministic TLB misses handled by software Primary and backup may execute a different number of instructions! Optimizations p p sends HV takes over TLB replacement Int immediately blocks for acks only before output commit

The JVM as a State Machine Asynchronous commands interrupts Non-deterministic commands read time-of-day Non-deterministic read set values multi-threaded access to shared data Output to the environment simulate a single, fault-tolerant state machine

Non-deterministic Commands Only invoked through Java Native Interface (JNI) direct access to OS and other libraries implement windowing, I/O, read HW clock Executes outside the JVM: can t agree on inputs!

Non-deterministic Commands Only invoked through Java Native Interface (JNI) direct access to OS and other libraries implement windowing, I/O, read HW clock Executes outside the JVM: can t agree on inputs!

Non-deterministic Commands Only invoked through Java Native Interface (JNI) direct access to OS and other libraries implement windowing, I/O, read HW clock Executes outside the JVM: Force agreement on the wsvs can t agree on inputs!

Non-deterministic Commands Only invoked through Java Native Interface (JNI) direct access to OS and other libraries implement windowing, I/O, read HW clock Not out of the woods: Executes outside the JVM: Force agreement on the wsvs can t agree on inputs! Non-deterministic output to the environment Non-deterministic method invocation