Reliability of Technical Systems

Similar documents
"Reliability and MTBF Overview"

Decision Analysis. Here is the statement of the problem:

Reliability Applications (Independence and Bayes Rule)

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

C. Wohlin, M. Höst, P. Runeson and A. Wesslén, "Software Reliability", in Encyclopedia of Physical Sciences and Technology (third edition), Vol.

Appendix A. About RailSys 3.0. A.1 Introduction

SOFTWARE-IMPLEMENTED SAFETY LOGIC Angela E. Summers, Ph.D., P.E., President, SIS-TECH Solutions, LP

Chapter 11 Monte Carlo Simulation

Data Structures Fibonacci Heaps, Amortized Analysis

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Quotes from Object-Oriented Software Construction

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.

1 Error in Euler s Method

Maximizing UPS Availability

Enforcing Security Policies. Rahul Gera

6.1. The Exponential Function. Introduction. Prerequisites. Learning Outcomes. Learning Style

Designing Fault-Tolerant Power Infrastructure

Lecture 9: Requirements Modelling

INTRUSION PREVENTION AND EXPERT SYSTEMS

NPTEL STRUCTURAL RELIABILITY

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Chapter 6 Competitive Markets

CHAPTER 11: Flip Flops

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

Module 1: Introduction to Computer System and Network Validation

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

8.7 Exponential Growth and Decay

IGSS. Interactive Graphical SCADA System. Quick Start Guide

Overseas Investment in Oil Industry and the Risk Management System

A new cost model for comparison of Point to Point and Enterprise Service Bus integration styles

Semantic Object Language Whitepaper Jason Wells Semantic Research Inc.

Figure 1: Opening a Protos XML Export file in ProM

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Fairfield Public Schools

Plug and Play Solution for AUTOSAR Software Components

Calculation of Risk Factor Using the Excel spreadsheet Calculation of Risk Factor.xls

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

In order to describe motion you need to describe the following properties.

Appendix A: Science Practices for AP Physics 1 and 2

Cumulative Diagrams: An Example

Simple and error-free startup of the communication cluster. as well as high system stability over long service life are

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Automatic Validation of Diagnostic Services

The Taxman Game. Robert K. Moniot September 5, 2003

Note on growth and growth accounting

Reliability Block Diagram RBD

Risk Analysis and Quantification

APP INVENTOR. Test Review

SSPI-APPN OTES: PARALLEL C ONNECTIONS

University of Ljubljana. Faculty of Computer and Information Science

Total Quality Management (TQM) Quality, Success and Failure. Total Quality Management (TQM) vs. Process Reengineering (BPR)

Object Oriented Design

Failure Mode and Effect Analysis. Software Development is Different

MKD UPDATES. Week 1, July M. K. Dandeker & Co. Prepared by: Sudarsan Nambiraghavan. Contributors: Irfan Abdul Ismail

Sequential Logic Design Principles.Latches and Flip-Flops

Alessandro Birolini. ineerin. Theory and Practice. Fifth edition. With 140 Figures, 60 Tables, 120 Examples, and 50 Problems.

How To Create A Multi Disk Raid

Markov Chain Monte Carlo Simulation Made Simple

Smooth functions statistics

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Decision Making under Uncertainty

Essential Power System Requirements for Next Generation Data Centers

Configuring ThinkServer RAID 500 and RAID 700 Adapters. Lenovo ThinkServer

Programming Languages

Multiagent Reputation Management to Achieve Robust Software Using Redundancy

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Performance Comparison of low-latency Anonymisation Services from a User Perspective

Evaluation of Supply Chain Management Systems Regarding Discrete Manufacturing Applications

Hardware safety integrity Guideline

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Mathematical models to estimate the quality of monitoring software systems for electrical substations

The Triangle and its Properties

Interpretation of Somers D under four simple models

Multiplying Fractions

Answer: The relationship cannot be determined.

Locating the Epicenter and Determining the Magnitude of an Earthquake

Implementing Microsoft Windows 2000 Clustering

Item ToolKit Technical Support Notes

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Course Description. Students Will Learn

Chapter 7 - Roots, Radicals, and Complex Numbers

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Calibration of Uncertainty (P10/P90) in Exploration Prospects*

ANALYTICS IN BIG DATA ERA

Managing Variability in Software Architectures 1 Felix Bachmann*

Mathematics Common Core Sample Questions

Nonparametric adaptive age replacement with a one-cycle criterion

Transcription:

Main Topics 1. Short Introduction, Reliability Parameters: Failure Rate, Failure Probability, etc. 2. Some Important Reliability Distributions 3. Component Reliability 4. Introduction, Key Terms, Framing the Problem 5. System Reliability I: Reliability Block Diagram, Structure Analysis (Fault Trees), State Model. 6. System Reliability II: State Analysis (Markovian chains) 7. System Reliability III: Dependent Failure Analysis 8. Data Collection, Bayes Theorem, Static Redundancy 9. Combined Redundancy, Dynamic Redundancy; Advanced Methods for Systems Modeling and Simulation I: Petri Nets 10. Advanced Methods for Systems Modeling and Simulation II: Object-oriented modeling and MC modeling 11. Human Reliability Analysis 12. Software Reliability, Fault Tolerance 13. Case study: Building a Reliable System

Hardware and Software Faults Mechanical / Hardware faults are design, manufacturing or operation faults. They caused by disturbance or wear-out, where the physical rules are well-known. Therefore these faults and their impacts on the failure rate are better understood than software faults. Software faults are (nearly always) design faults, which exist as latent faults during system operation. Well tested programs may still contain a high number of design faults, but exhibit a low failure rate. HS 2010 / Irene Eusgeld 3

1.Removal of the failure: Countermeasures Against Software Faults system reset (depends on environment) Failure can, but need not occur again. 2.Countermeasures against future failures: error by-passing: questionable input data and/or commands are no longer used and substituted as far as possible design fault tolerance by diverse design 3.Removal of the design fault: program correction (i. e. software repair): fault localization in the program, change of the program to remove the design fault and not to insert a new one HS 2010 / Irene Eusgeld 4

Reliability Software is not destroyed by design fault and thus repairable, in principle (program correction). Reliability measure to be quantified: Failure rate z(t) Reliability measures derived thereof: Availability V V = µ λ + µ where the failure rate z(t) = λ and the repair rate μ are constant (at least approximately for a limited period of operation) Reliability R(t) for time t from process start or restart, respectively: If failure rate z(t) = λ is constant: R(t) = e t z(x)dx 0 R(t) = e λt HS 2010 / Irene Eusgeld 5

Fault Locations and their Correction (1/2) number of fault locations in a program: s(t) number of fault locations where a correction or improvement has been attempted (successful or not) b(t) s(t) + b(t) s(0) holds where s(0) is the number of fault locations at the beginning of an operation ">" if some attempts have not been successful improvement portion a If an improvement is tried after each software failure a = 1 holds. a > 1 is impossible, since we assume that faults can only be improved after they have been detected through a failure. HS 2010 / Irene Eusgeld 6

Fault Locations and their Correction (2/2) It should be noticed that the number of fault locations in a program cannot always be quantified without ambiguity. Fault locations may overlap or spread over a collection of statements. Semantical faults can not be uniquely to a syntactical structure. The improvement portion a is determined by the organization of software maintenance. During the test phase nearly all failures are analyzed and improved (a = 1). In a later operation phase, the user and the maintenance personnel are separated from each other. Therefor, the maintenance personnel usually collects a number of (possibly identical or similar) failure reports, before an improvement is attempted (a < 1). HS 2010 / Irene Eusgeld 7

Problems of Software Design Fault Modeling In complex software the interactions between various modules cannot be covered by a simple model: A software system cannot be arbitrarily subdivided into independent components. Consequently, software components can be "big" modules. The failure rate depends not only on the component (the program), but also on its environment (application, input data, load). Consequently, reliability measures cannot be imported from one application area to another one. Due to the design fault improvements the failure rate is decreasing rather than constant. A constant failure rate may only be assumed for the time interval between to program improvements. HS 2010 / Irene Eusgeld 8

Can software faults be modeled by analogy to hardware faults? bathtub curve How can the three phases of the bathtub curve be interpreted in this case? HS 2010 / Irene Eusgeld 9

Modeling Software Faults by Analogy to Hardware Faults The three phases of the so-called bathtub curve could also be meaningful for the software: Early faults characterize the design fault undiscovered during the testing phase and causing a failure during the early operation. Random faults occur sporadically in the middle of the software life time where "improvements" remove approximately the same number of design faults as are newly inserted into the software. Wear-out is completely impossible in the software, of course. However, when the environment changes during the time or the software is used for different applications, the failure rate may increase similarly to a wear-out. On the one hand, there are analogies and similarities in the interpretation of early faults and random faults. On the other hand, there are a lot of differences in the interpretation of the late phase. HS 2010 / Irene Eusgeld 10

11

12

Objectives of Diverse Design increased probability to generate a correct variant majority of correct variants independent design faults in the variants, where the design faults are not activated by the same input data Diversity does not guarantee that these goals are met. Even a decrease in reliability is possible. However, an improved reliability can be expected ("the principle of hope"). Diversity cannot substitute the efforts to make the software more perfect. HS 2010 / Irene Eusgeld 13

14

* *Back-to-back testing involves cross-comparison of all responses obtained from functionally equivalent software components. Back-to-back testing can remain an efficient way of detecting failures even when the probability of identical and wrong responses from all participating versions, is very close to one. HS 2010 / Irene Eusgeld 15

16

Two simplified fault models: Linear Model (somewhat optimistic) Exponential Model (somewhat pessimistic) z(b) failure rate, b number of fault locations HS 2010 / Irene Eusgeld 17

18

19

20

21

22

23

24

25

26

27

Linear Model: What time is required to detect and to improve N design faults? HS 2010 / Irene Eusgeld 28

Linear Model: How many improvements must be made in order to reduce the failure rate from z 2 down to z 3? HS 2010 / Irene Eusgeld 29

30

31

32

Exponential Model: How many improvements must be made in order to reduce the failure rate from z 2 down to z 3? Exponential Model: What time is required to detect and to improve N design faults? HS 2010 / Irene Eusgeld 33

34

Literature Echtle, K. Fault-Tolerant Software Systems, Lecture-Script, Informatik, University of Duisburg-Essen. D. Musa, A. Iannino, and K. Okumoto. Software Reliability: Measurement, Prediction, Application. McGraw-Hill, 1987. ISBN 0-07-044093-X HS 2010 / Irene Eusgeld 35