Evaluating the Effect of Microreboots on End Users

Similar documents
Specification and Implementation of Dynamic Web Site Benchmarks. Sameh Elnikety Department of Computer Science Rice University

A Comparison of Software Architectures for E-Business Applications

Evaluating and Comparing the Impact of Software Faults on Web Servers

Holistic Performance Analysis of J2EE Applications

Tuning Dynamic Web Applications using Fine-Grain Analysis

A Comparison of Software Architectures for E-business Applications

Performance Analysis of Web based Applications on Single and Multi Core Servers

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

Java Management Extensions (JMX) and IBM FileNet System Monitor

Distributed Objects and Components

Database Replication Policies for Dynamic Content Applications

Microreboot A Technique for Cheap Recovery

Autonomous Recovery in Componentized Internet Applications

Performance Comparison of Middleware Architectures for Generating Dynamic Web Content

Insight into Performance Testing J2EE Applications Sep 2008

Enhancing an Application Server to Support Available Components

An Oracle White Paper March Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite

Introducing Performance Engineering by means of Tools and Practical Exercises

Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

Using Micro-Reboots to Improve Software Rejuvenation in Apache Tomcat

Transactionality and Fault Handling in WebSphere Process Server Web Service Invocations. version Feb 2011

Oracle WebLogic Foundation of Oracle Fusion Middleware. Lawrence Manickam Toyork Systems Inc

Dangers and Joys of Stock Trading on the Web: Failure Characterization of a Three-Tier Web Service

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Using Oracle Real Application Clusters (RAC)

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Rice University 6100 Main Street, MS-132 Houston, TX, 77005, USA

How to Mitigate Service Failures

Using DataDirect Connect for JDBC with Oracle Real Application Clusters (RAC)

Monitoring Pramati EJB Server

JAGR: An Autonomous Self-Recovering Application Server

Informatica Data Director Performance

Clustering Versus Shared Nothing: A Case Study

Applications Manager Best Practices document

How Routine Data Center Operations Put Your HA/DR Plans at Risk

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

A Scalability Study for WebSphere Application Server and DB2 Universal Database

Brocade Virtual Traffic Manager and Oracle EBS 12.1 Deployment Guide

An Oracle White Paper July Oracle Database 12c: Meeting your Performance Objectives with Quality of Service Management

Transactions and the Internet

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Performance Modeling for Web based J2EE and.net Applications

1. Welcome to QEngine About This Guide About QEngine Online Resources Installing/Starting QEngine... 5

JReport Server Deployment Scenarios

Conflict-Aware Scheduling for Dynamic Content Applications

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Delivering Quality in Software Performance and Scalability Testing

Monitoring Applications on Pramati Server

New Methods for Performance Monitoring of J2EE Application Servers

A framework for web-based product data management using J2EE

LinuxWorld Conference & Expo Server Farms and XML Web Services

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

Desktop and Professional Editions

Newsletter 4/2013 Oktober

MAKING YOUR VIRTUAL INFRASTUCTURE NON-STOP Making availability efficient with Veritas products

Online Backup Client User Manual

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Solutions for detect, diagnose and resolve performance problems in J2EE applications

System Requirements Table of contents

Virtuoso and Database Scalability

MODERN client-server distributed computing systems

Transaction Manager Failover: A Case Study Using JBOSS Application Server

Dependable Systems. Trends in Software Dependability. Dr. Peter Tröger. Most pictures (C) IBM

SOA management challenges. After completing this topic, you should be able to: Explain the challenges of managing an SOA environment

How to Build an E-Commerce Application using J2EE. Carol McDonald Code Camp Engineer

This Deployment Guide is intended for administrators in charge of planning, implementing and

Installing Oracle 12c Enterprise on Windows 7 64-Bit

Compiere Technical Architecture Modern, configurable, extendible

Configuring Apache Derby for Performance and Durability Olav Sandstå

Oracle Weblogic. Setup, Configuration, Tuning, and Considerations. Presented by: Michael Hogan Sr. Technical Consultant at Enkitec

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Pattern-based J2EE Application Deployment with Cost Analysis

JBOSS OPERATIONS NETWORK (JBOSS ON) MONITORING

Automatic Service Migration in WebLogic Server An Oracle White Paper July 2008

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

Papermule Workflow. Workflow and Asset Management Software. Papermule Ltd

AppDynamics Lite Performance Benchmark. For KonaKart E-commerce Server (Tomcat/JSP/Struts)

25 May Code 3C3 Peeling the Layers of the 'Performance Onion John Murphy, Andrew Lee and Liam Murphy

What is new in Switch 12

.NET 3.0 vs. IBM WebSphere 6.1 Benchmark Results

White Paper. How to Achieve Best-in-Class Performance Monitoring for Distributed Java Applications

1. Introduction 1.1 Methodology

Dependency Free Distributed Database Caching for Web Applications and Web Services

1.0 Hardware Requirements:

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

BitDefender Security for Exchange

Improving the Performance of Online Auctions Through Server-side Activity-Based Caching

Affinity Aware VM Colocation Mechanism for Cloud

Tutorial: Load Testing with CLIF

Enterprise Application Integration

A Hybrid Web Server Architecture for e-commerce Applications

Chapter 1 - Web Server Management and Cluster Topology

How To Test A Web Server

Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology Fall 2007

Deploying Windows Streaming Media Servers NLB Cluster and metasan

IBM Business Monitor V8.0 Global monitoring context lab

JBoss AS Administration Console User Guide. by Shelly McGowan and Ian Springer

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Lab - Dual Boot - Vista & Windows XP

Transcription:

Evaluating the Effect of Microreboots on End Users George Candea + Armando Fox with help from Ben Ling and Wes Weimer January 12, 2004

Review Microreboots restart fine-grained components with a clean slate only take a fraction of the time needed for full system reboot They provide a simple recovery technique for Internet services which can be supported entirely in middleware and requires no changes to apps or a priori knowledge of app semantics Low cost use microreboots aggressively even when their necessity is less than certain reduces recovery time reduces time spent detecting/diagnosing failures 2 2004 George Candea

Goals and Preview Determine how infra microreboots impact end users Microreboots are less disruptive in terms of downtime and lost work (compared to full reboots) Reduced (compared to a server process restart) number of failed user requests by 65% perceived downtime by 78% 3 2004 George Candea

Outline 1. Measuring end user experience in interactive services 2. Session-weighted operation throughput 3. End user effects of microrebooting 4 2004 George Candea

Three-Tiered Architecture 5 2004 George Candea

Simulating End Users RUBiS = open-source web-based auction application, modeled on ebay client states correspond to various RUBiS operations, such as Register, PutBid, etc. (27 in total) non-app-specific states: user hits back button and user spontaneously decides to end session client workload described using a state transition table T rows/columns = client's possible states T(s1,s2) = probability of a client clicking from s1 to s2 Emulator uses table T to automatically navigate the web site when in state s1, randomly choose the next state based on T(s) with the requested probability then generates corresponding URL and clicks on it T also has column for wait time inbetween clicking from a state to the next we set this to zero initiate next HTTP request as soon as the current request completes (unlike a real user) Responses classified as incorrect: network-level error (cannot connect to server, etc.) HTTP 4xx or 5xx error code HTML page containing particular keywords ( error, failed ', exception ) Use correct responses to compute goodput 6 2004 George Candea

The Goodput Anomaly All DB conns start failing Goodput = throughput of successfully served requests Is the service available? Is it delivering useful service? Can we increase goodput by failing components? (see total throughput...) 7 2004 George Candea

Sessions Typical interaction of a client with the web site: client goes to the homepage browses around for a while performing different site actions (searching, etc.) accumulates session state decides to do something that touches the persistent-state database (e.g., place a bid, update his/her profile, etc.) assume interactions preceding the persistent-state update are just precursors to crowning moment failed op during session retry or logout+login (typically at homepage) session = sequence of URLs bracketed by homepage and either abandonment of site or return to homepage (new session) definition relies on general user behavior, not app semantics Goodput anomaly explanation discard requests in failed sessions 8 2004 George Candea

The G wop Metric Session-weighted goodput (G wop ) weighs each session by the number of operations in it measures standard throughput of successful/failed requests respectively, but whenever a session fails, all ops in that session are counted as failed Conservative metric for recovery measurements G wop captures the fact that when a long session succeeds, the user got a lot more done than when a short session succeeds 9 2004 George Candea

Outline 1. Measuring end user experience in interactive services 2. Session-weighted operation throughput 3. End user effects of microrebooting 10 2004 George Candea

Effect of Microreboots on End Users Analogous to rebooting, a microreboot is a logical restart of an application component that may be finer-grained than a process Basic microrebootable component of J2EE applications is the EJB Leverage existing JBoss mechanism for cleanly shutting down an EJB Isolate microreboot from fault detection initiate reboot w/out injecting faults i.e., app server instantaneously detects fault and initiates recovery chosen workload covers all possible RUBiS operations runs lasting 1 minute or longer routinely exercised all components workload mix: 85% read ops and 15% DB write ops 11 2004 George Candea

G wop for Reboot-Based Recovery 12 2004 George Candea

Summary of Results Recovery Technique Failed Reqs Downtime [sec] Improvement (served reqs) Improvement (downtime) JBoss Restart 713 108 -- -- RUBiS Restart 615 94 14% 13% 1-EJB microreboot 251 24 65% 78% 3-EJB microreboot 345 29 52% 73% 13 2004 George Candea

Improving End User Experience w/ Transparent Retry Expose EJB microreboot through a RetryLater(t) exception Modified EJB container catches RetryLater(t) exceptions and retries transparently Call succeeds after predetermined number of retries original bean code sees successful call Call fails container throws exception to the caller Masks microreboots (and transient failures) from callers Idempotency: invocations are atomic (call either goes through or RetryLater is thrown) preserves JBoss's regular call semantics In-progress calls fail (upon microreboot) the same way as if the EJB crashed, had a bug, etc. 14 2004 George Candea

Feedback http://crash.stanford.edu Better name for G wop? 15 2004 George Candea

Details on Microreboots Application-generic recovery based on checkpointing applies to relatively few existing applications Part of the appeal of rebooting as a recovery tech is that it discards corrupted transient state that might itself be the cause of the failure or whose cleanup may be necessary in order for recovery to succeed Replace recovery with rebooting (logically equivalent to restarting from a checkpoint that is the start state of the component) is likely to work To ensure that it is safe, we need three environmental conditions: Clear boundary around what is being rebooted Loose coupling Preserving state and consistency Analogous to rebooting, a microreboot is a logical restart of an application component that may be finer-grained than a process, but the same requirements apply Our basic microrebootable component of J2EE applications is the EJB 16 2004 George Candea

RUBiS Details RUBiS = open-source web-based auction application, modeled on ebay offers selling, browsing and bidding three kinds of users: visitor, buyer, and seller, with buyer and seller sessions requiring login buyer can bid on items and consult a summary of their current bids, rating and comments left by other users Seller sessions require a fee before a user is allowed to put up an item for sale Seller can specify a reserve (minimum) price for an item RUBiS contains 582 Java files and about 26K lines of code Uses MySQL and stores 7 tables In default configuration, RUBiS has 33,000 items for sale, distributed among ebay's 40 categories and 62 regions. There is an average of 10 bids per item, or 330,000 entries in the bids table. The users table has 1 million entries. 17 2004 George Candea

Why choose 20 clients? 20 concurrent clients with no think time inbetween successive requests human user typically spends in excess of 2 seconds between clicks, we believe the load placed by one of our simulated clients is equivalent to that of 100 or more real clients did not want to introduce artificial think time (the way is done, for instance, in the TPC-W benchmark) because having think time would add one more variable to the experiment and would not offer any useful insight for microreboot experiments 20 concurrent rapid clients is the threshold beyond which thrashing and other side effects would reduce throughput 18 2004 George Candea

Experimental Platform mimic what would be typical of a small Internet service Linux RedHat 9.0 JBoss and the Web tier run on an AMD Athlon XP 2600+ PC with 1.5 GB RAM Database (MySQL Max 3.23) on another identical node Sun HotSpot JVM 1.4.1. and allocate it 1 GB of RAM Client emulator on a dual P-III (2x866 MHz) with 1 GB RAM All machines interconnected by 100 Mbps Ethernet switch 19 2004 George Candea

RUBiS f-map description of the failure dependencies between RUBiS's components using automated fault-propagation inference (AFPI) majority of such dependencies in RUBiS are between the stateless servlets and EJBs AFPI information is collected during a completely automated fault-injection campaign that requires no a priori knowledge of the applications' structure or semantics For RUBiS, a simple recovery policy could be based on the fact that there is a known mapping from the URL being accessed to the action being taken (and hence the EJBs being touched) 20 2004 George Candea