Computer Lab Software Fault-tolerance: Task Process Pairs



Similar documents
Program 5 - Processes and Signals (100 points)

Fixing Problems with IP Phone Services

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

Tutorial: Load Testing with CLIF

Basic Unix/Linux 1. Software Testing Interview Prep

Distributed Systems (CS236351) Exercise 3

Server Virtualization with Windows Server Hyper-V and System Center

Manual. Netumo NETUMO HELP MANUAL Copyright Netumo 2014 All Rights Reserved

Fundamentals of LoadRunner 9.0 (2 Days)

CS355 Hw 3. Extended Shell with Job Control

Release Notes OPC-Server V3 Alarm Event for High Availability

Internet Information TE Services 5.0. Training Division, NIC New Delhi

COMP 112 Assignment 1: HTTP Servers

Specify the location of an HTML control stored in the application repository. See Using the XPath search method, page 2.

UTC3100 and 3170 POS RAID Information

Magento Search Extension TECHNICAL DOCUMENTATION

Setting cron job Linux/Unix operating systems using command-line interface

Division of Informatics, University of Edinburgh

Quick Start Guide. GV-Redundant Server GV-Failover Server. 1 Introduction. Packing List

ArcGIS for Server: Administrative Scripting and Automation

Outline. Review. Inter process communication Signals Fork Pipes FIFO. Spotlights

Using Continuous Operations Mode for Proper Backups

Computer Systems II. Unix system calls. fork( ) wait( ) exit( ) How To Create New Processes? Creating and Executing Processes

The QueueMetrics Uniloader User Manual. Loway

SiS 180 S-ATA User s Manual. Quick User s Guide. Version 0.1

Appendix. Web Command Error Codes. Web Command Error Codes

Distributed Version Control

IMF Tune v7.0 Backup, Restore, Replication

Guideline for stresstest Page 1 of 6. Stress test

Introduction: The Xcode templates are not available in Cordova or above, so we'll use the previous version, for this recipe.

Troubleshooting AVAYA Meeting Exchange

10215A Implementing and Managing Microsoft Server Virtualization

Dell OptiPlex XE Watchdog Timer

Secure Messaging Server Console... 2

Advanced Computer Networks Project 2: File Transfer Application

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

How to create a load testing environment for your web apps using open source tools by Sukrit Dhandhania

Implementing and Managing Microsoft Server Virtualization

GSM. Quectel Cellular Engine. HTTP Service AT Commands GSM_HTTP_ATC_V1.2

WIRIS quizzes web services Getting started with PHP and Java

LICENSE4J AUTO LICENSE GENERATION AND ACTIVATION SERVER USER GUIDE

Wireless Printing in the Boston College Libraries

Tableau Server Trusted Authentication

Wireshark Lab: Assignment 1w (Optional)

Preparing an IIS Server for EmpowerID installation

How To Test The Bandwidth Meter For Hyperv On Windows V (Windows) On A Hyperv Server (Windows V2) On An Uniden V2 (Amd64) Or V2A (Windows 2

Chapter 3 Startup and Shutdown This chapter discusses how to startup and shutdown ETERNUSmgr.

How To Set Up A Raid On A Hard Disk Drive On A Sasa S964 (Sasa) (Sasa) (Ios) (Tos) And Sas964 S9 64 (Sata) (

DiskPulse DISK CHANGE MONITOR

MS-10215: Implementing and Managing Microsoft Server Virtualization. Course Objectives. Required Exam(s) Price. Duration. Methods of Delivery

Configuring Single Sign-on for WebVPN

CS170 Lab 11 Abstract Data Types & Objects

Configuring Static and Dynamic NAT Simultaneously

Acunetix Website Audit. 5 November, Developer Report. Generated by Acunetix WVS Reporter (v8.0 Build )

Windows Server Update Services 3.0 SP2 Operations Guide

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

Technical Bulletin. SQL Express Backup Utility

Ken ichi Ohmichi NEC Solution Innovators, Ltd.

6.828 Operating System Engineering: Fall Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

Version Control! Scenarios, Working with Git!

Magic Submitter Questions and Answers

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Ensure that your environment meets the requirements. Provision the OpenAM server in Active Directory, then generate keytab files.

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Performance Analysis of webmethods Integrations using Apache JMeter Information Guide for JMeter Adoption

Introduction Installing the download utility Installing Java(TM) 2 Runtime Environment, Standard Edition

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

>

Lecture 6 Cloud Application Development, using Google App Engine as an example

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Intellex Platform Security Update Process. Microsoft Security Updates. Version 06-10

Overview and History of Operating Systems

State-Machine Replication

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

Testhouse Training Portfolio

Cisco TelePresence Management Suite Extension for Microsoft Exchange Version 4.0

Response Time Analysis of Web Templates

2 Downloading Access Manager 3.1 SP4 IR1

The Importance of Software License Server Monitoring White Paper

ELIXIR LOAD BALANCER 2

Brainlab Node TM Technical Specifications

InternetVista Web scenario documentation

Contents. 2 Alfresco API Version 1.0

Imaging License Server User Guide

HUAWEI MediaPad T1 8.0 FAQ HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date 06/30

FirewallTM. isecurity. Out-of-the Box. The Network Security Component of. Version 15. Copyright Raz-Lee Security Ltd.

HARD-EARNED PERFORMANCE LESSONS

Instrumentation for Linux Event Log Analysis

Magento Security and Vulnerabilities. Roman Stepanov

Peach Fuzzer Platform

Network Interface Failover using FONA

Oracle Collaboration Suite

Introduction to Automated Testing

OVERVIEW OF ASP. What is ASP. Why ASP

XenDesktop 5 (SP1) Broker Event Log Messages. This article contains information on XenDesktop 5 (SP1) Broker Event Log Messages.

1 Scope of Assessment

Transcription:

Computer Lab Software Fault-tolerance: Task Process Pairs Systems Engineering Group Dresden University of Technology http://wwwse.inf.tu-dresden.de/ January 25, 2013 1 / 16

One task less! Too many tasks to present! Task state correction removed Do you still need tasks? Talk to me at the end of the session... 2 / 16

Outline Task Process Pair : 1 Process Pairs 2 Scenario 3 Task 3 / 16

Process Pairs a watch dog process monitors an unreliable worker process watch dog spawns a new worker as soon as the old one crashes 4 / 16

Implementation watch dog uses fork to spawn a new child process child process starts offering its service right after the fork the parent process uses waitpid to block as long the child process is running when waitpid returns, the child has crashed so the watch dogs spawns a new child 5 / 16

Extensions Graceful degradation for fork: watch dog process becomes worker if fork fails in case of a crash the watch dog process is not restarted Check-pointing the worker: worker saves its current state on HDD in reasonable intervals before worker starts with its job it restores the state of its crashed predecessor 6 / 16

Scenario: Watching the Watch Dog The environment of the watch dog is unreliable! 7 / 16

HTTP-Watch Dog Details: implemented in C++ Makefile watch_dog.cc url.lst command line help: 1 Error : Wrong number of command line arguments 2 Usage : watch_dog <url_file > <timeout > <pause > 3 <url_file >: file with URLs to monitor ( one per line ) 4 <timeout >: timeout in ms for requests to the server 5 <pause >: pause in ms before starting the next request 8 / 16

Example: url.lst Content of url.lst: 1 http :// www. heise.de/ security / dienste / browsercheck / tests / activex. shtml 2 http :// wwwse. inf.tu - dresden.de/ 3 http :// wwwse. inf.tu - dresden.de/ does_not_exist. html 4 http :// www. does. not. exist / 9 / 16

Example Example: 1 # >./ watch_dog url. lst 2000 5000 2 URL file = url. lst 3 timeout = 2000 ms 4 pause = 5000 ms 5 host = www. heise. de; uri = / security / dienste / browsercheck / tests / activex. shtml 6 host = wwwse. inf.tu - dresden.de; uri = / 7 host = wwwse. inf.tu - dresden.de; uri = / does_not_exist. html 8 host = www. does. not. exist ; uri = / 9 === > Successful response from host www. heise. de (193.99.144.85) : HTTP /1.1 200 OK 10 === > Successful response from host wwwse. inf. tu - dresden. de (141.76.44.180) : HTTP /1.1 200 OK 11 === > Host wwwse. inf. tu - dresden. de (141.76.44.180) does not respond with " success " 12 === > response line : HTTP /1.1 404 Not Found 13 === > Could not find host www. does. not. exist 14 === > Reason : Invalid argument 10 / 16

Environment unreliability HTTP Watch Dog crashes nondeterministically within an ongoing request reasons are unknown could be: hardware failure (e.g. in network interface) software bugs in OS or libraries software bugs in HTTP watch dog error search would be too expensive and too time consuming 11 / 16

Task Overview Increase the reliability of HTTP Watch Dog using the process pair approach. Extend watch_dog.cc with following features: process pairs to protect execution of void test_server (const URL& url, int timeout) graceful degradation in presence of fork failures worker process saves check point after each completed HTTP request a started worker restores its state from an existing check point Attention: Do not change output of function test_server 12 / 16

Hints you do not need to change the functions test_server and read_url_list keep the check point as small as possible fork as rarely as possible do not change the command line usage do not change the format of url.lst read man pages of fork and waitpid do not add or change any output statement that starts with TEST_PREFIX 13 / 16

Testing your solution Attention: test your solution before sending it in. Consider appropriate test strategies: simulate several crashes in test_server: use our fault injector (in initial checkout) run: LD_PRELOAD=fault_injector/fault_injector.so./watch_dog... or kill worker from second terminal (e.g. with kill or Windows Task-Manager) simulate fork failures test your check pointing solution 14 / 16

Conclusion Conclusion add process pairs, check pointing, and graceful degradation test our solution with fault injectors check in: watch_dog.cc hold the deadline to get the certificate 15 / 16

Deadline Deadline: Feb 15 th 2012 16 / 16