Enhancing Computer System Reliability Using Fault Tree Analysis

Similar documents
Generic Reliability Evaluation Method for Industrial Grids with Variable Frequency Drives

An Introduction to Fault Tree Analysis (FTA)

3.4.4 Description of risk management plan Unofficial Translation Only the Thai version of the text is legally binding.

Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency

RISK ASSESMENT: FAULT TREE ANALYSIS

Database Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu

On Cloud Computing Technology in the Construction of Digital Campus

DC/DC BUCK Converter for Renewable Energy Applications Mr.C..Rajeshkumar M.E Power Electronic and Drives,

Study of Lightning Damage Risk Assessment Method for Power Grid

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13-15, 2013, Hong Kong

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW

FMEA and FTA Analysis

Research on the UHF RFID Channel Coding Technology based on Simulink

Method of Fault Detection in Cloud Computing Systems

Fault Tree Analysis (FTA) and Event Tree Analysis (ETA)

Estimation Model of labor Time at the Information Security Audit and Standardization of Audit Work by Probabilistic Risk Assessment

Ultrasonic Detection Algorithm Research on the Damage Depth of Concrete after Fire Jiangtao Yu 1,a, Yuan Liu 1,b, Zhoudao Lu 1,c, Peng Zhao 2,d

Preliminary Probe into Emergency Management Parallel System. for Chemical Plant

Modelling and Simulation of the Freezing Systems and Heat Pumps Using Unisim Design

University of Ljubljana. Faculty of Computer and Information Science

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article. Study on audit of corporate financial risk based on FMEA method

Gates, Circuits, and Boolean Algebra

Image Compression through DCT and Huffman Coding Technique

Fault Diagnosis and Maintenance for CNC Machine. Based on PLC

Failure Analysis Methods What, Why and How. MEEG 466 Special Topics in Design Jim Glancey Spring, 2006

Fuzzy Knowledge Base System for Fault Tracing of Marine Diesel Engine

Optimization of Preventive Maintenance Scheduling in Processing Plants

A New Method for Traffic Forecasting Based on the Data Mining Technology with Artificial Intelligent Algorithms

Yijun Gao, Wei Wu, Zongwei Han, Xianting Li *

ACL Based Dynamic Network Reachability in Cross Domain

DIGITAL MEDICAL CARE WITH ITERATIVE AND PROBABILISTIC DISEASE PREDICTION

Radiotherapy Errors Analysis before Plan Delivery based on Probabilistic Safety Analysis Method

Alabama Department of Postsecondary Education

Numerical Analysis of the Moving Formwork Bracket Stress during Construction of a Curved Continuous Box Girder Bridge with Variable Width

An Analysis on Objectives, Importance and Types of Software Testing

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Time-Frequency Detection Algorithm of Network Traffic Anomalies

Study on Human Performance Reliability in Green Construction Engineering

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

PowerPoint Energy Simulation For Clean Diesel

Flexible Configuration Mechanism of Control and Data Management

Controlling Risks Risk Assessment

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

Introduction. Chapter 1

The Use of Hybrid Regulator in Design of Control Systems

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

FINANCIAL RISK EVALUATION BY THE TREE OF PROBABILITY DECISIONS METHOD Mariya Bruseva Varna Free University, Bulgaria

Role of Neural network in data mining

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR

MULTI-INPUT DC-DC CONVERTER FOR RENEWABLE ENERGY SOURCES

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 10, 2015 ISSN (online):

R. Cwilewicz & L. Tomczak Marine Propulsion Plant Department, Gdynia Maritime University, Poland

Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform

CLEANING IN PLACE AUTOMATION FOR PROCESS INDUSTRY USING PLC AND SCADA SOFTWARE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

When to Refinance Mortgage Loans in a Stochastic Interest Rate Environment

Estimating Software Reliability In the Absence of Data

Reliability Block Diagram RBD

ANALYSIS OF FAILURE STATISTICS FOR CONE CRUSHER MAINTENANCE Vis-a-Vis OPERATIONAL RELIABILITY ASSESSMENT

IAEA Training Course on Safety Assessment of NPPs to Assist Decision Making. System Analysis. Lecturer. Workshop Information IAEA Workshop

Comparative Study of Automated testing techniques for Mobile Apps

Human Reliability Analysis. Workshop Information IAEA Workshop

A Risk Management System Framework for New Product Development (NPD)

THE LEAN-RESOURCES BASED CONSTRUCTION PROJECT PLANNING AND CONTROL SYSTEM

PROGRAMMING FOR CIVIL AND BUILDING ENGINEERS USING MATLAB

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article. Analysis of results of CET 4 & CET 6 Based on AHP

Modeling Agile Manufacturing Cell using Object-Oriented Timed Petri net

2 SYSTEM DESCRIPTION TECHNIQUES

An Interference Avoiding Wireless Network Architecture for Coexistence of CDMA x EVDO and LTE Systems

The Planning Method for Disaster Recovery Program of Digital Business Based on HSM-III Model

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

University of Paderborn Software Engineering Group II-25. Dr. Holger Giese. University of Paderborn Software Engineering Group. External facilities

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

A Test Case Generator for the Validation of High-Level Petri Nets

YOKING OBJECT ORIENTED METRICS THROUGH MUTATION TESTING FOR MINIMIZING TIME PERIOD RAMIFICATION

FUNCTIONAL ANALYSIS AND ALLOCATION

Schedule Risk Analysis Simulator using Beta Distribution

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The Study on Logistics Management Patterns based on Fuzzy Comprehensive Evaluation Method

An Introduction to. Metrics. used during. Software Development

Design and Development of Virtual Instrument (VI) Modules for an Introductory Digital Logic Course

Big Data with Rough Set Using Map- Reduce

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP ENVIRONMENTAL SCIENCE

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

American International Journal of Research in Science, Technology, Engineering & Mathematics

A Framework in Determining Extended Warranty by Using Two Dimensional Delay Time Model

Automatic Detection of PCB Defects

Enhancing On-Line Conferencing Ba with Human-Machine Interaction CorMap Analysis

Design of Four Input Buck-Boost DC-DC Converter for Renewable Energy Application

INFORMATION SECURITY RISK ASSESSMENT UNDER UNCERTAINTY USING DYNAMIC BAYESIAN NETWORKS

COURSE SYLLABUS

Efficient Detection of Ddos Attacks by Entropy Variation

Candle Plant process automation based on ABB 800xA Distributed Control Systems

Modeling and Performance Evaluation of Internet of Things based on Petri Nets and Behavior Expression

Transcription:

International Journal of Recent Research and Review, Vol.VI, June 2013 ISSN 2277 8322 Enhancing Computer System Reliability Using Fault Tree Analysis Antima Saxena 1, Tanuj Manglani 2 Department of CSE, YIT, Jaipur, India Department of Electrical Engineering, YIT, Jaipur, India Email : saxenayit@gmail.com, idtanuj@gmail.com Abstract - Fault Tree Analysis of different systems is the qualitative and quantitative analysis of their fault occurrence. The fault tree analysis is a deductive failure based approach, where an undesired state of the system is specified and analyzed to find the ways in which the system can reach that state. The availability of a computer-based FTA methodology will greatly benefit the computer system. Ideally, FTA can be standardized through a computer package that reads information contained in process block diagram and provides automatic aids to assists engineers in engineering and analyzing fault trees. In order to enhance the reliability of computer system, the fault symptoms are defined and then its fault tree model has been established. All the common fault causes are figured out qualitatively by Fussel algorithm and are classified as 8 minimal cut sets. At last, the happening probability of top event, important degree of probability and key importance of each basic event are quantitatively calculated. Keywords- Computer System, Fault Tree, Qualitative Analysis, Quantitative Calculation, Minimal cut set, FTA I. INTRODUCTION Fault tree analysis (FTA) is a diagrammatic and logical method to evaluate the probability of an accident resulting from the sequences and combinations of faults and failure events. A fault tree describes a model and interprets the relations between the malfunctions of the component and observed symptoms. Thus, the fault tree is useful for understanding logically the mode of occurrence of a fault. Furthermore for given failure probabilities of system components, the probability of the top event can be calculated. Fault Tree Analysis (FTA) is a powerful tool for analyzing the reliability and safety of large-scale complicated system. In the early 1960s, scientists in Bell laboratory first bring forward the FTA method. After that, Boeing Company improved the method to make it suitable for computer processing. Now there are many computer software of FTA available. FTA methods have entered into industry such as chemistry, light industry, electron, machinery manufacturing, etc. [1]. This paper focuses on quantitative and qualitative analysis of fault trees. The test system in this paper is taken as computer system. II. FAULT TREE ANALYSIS METHODS Fault tree were first developed in the 1960s to facilitate unreliability analysis of the Minuteman missile system [2].They provide a compact, graphic, intuitive method to analyze system reliability. Traditional fault tree use Boolean gates to represent how component failures combine to produce system failures, and they are analyzed using cut sets (or other Boolean algebraic methods) or Monte Carlo simulation [3]. When FTA is used for analyzing a system, what we are concerned is to find all kinds of possible causes that lead to certain unexpected event (called top event in the following). Top event is usually the system failure or some kind of failure mode. FTA method finds all possible event combination which makes the top event happen with deductive method. It reflects logic relation of computer system and resulted top event. This relationship is represented by figure which is like an upside down tree 12

and hence named fault tree. The fault tree show the relationship between the events and logical symbols such as the AND, OR, XOR etc. are use to depict the relationship between the event. The two basic gate categories used in fault tree are the OR gate and the AND gate. Due to the fact that the gates relate the events in the same way as Boolean expressions, all the Boolean laws can be applied in the fault tree. The most difficult part of creating a fault tree is the determination of the top level event. The selection of the top event is crucial since hazards in the systems will not be comprehensive unless the fault trees are drawn for all significant top level events. Once the top event has been defined, the next step is to determine the events related to the top event and the logical relations between them, using logical symbols to define the relations. The output of an AND gate only exists if all the inputs events exist. The output of an OR gate exists provided at least one on the input events exist. The relationships between the events are standard logical relations and can therefore be expressed using any form of Boolean algebra or truth table. A successful FTA requires the following steps be carried out: 1. Identify the objective for the FTA. 2. Define the top event of the fault tree. 3. Define the scope of the FTA. 4. Define ground rules for the FTA. 5. Construct the fault tree. 6. Evaluate the fault tree. 7. Interpret and present the results. The procedure of FTA analysis is: choose the top event and then construct the fault tree, finally assess the fault tree qualitatively or quantitatively. III. SYMBOLOGY Symbology is the building blocks of the fault tree. There are two types of symbols which appear in the fault tree structure: gates and events. Primary Event: The primary events of a fault tree are those events, which, for one reason or another have not been further developed. Basic Event: A basic initiating fault requiring no further development. These are found on the bottom tiers of the tree and require no further development or breakdown. Conditioning Event: Specific condition or restrictions that apply to any logic gates. Undeveloped Event: An event which is not further developed either because it is of sufficient consequences or because information is unavailable. Transfer Symbols: External Event: An event which is normally expected to occur. TRANSFER IN: Indicates that the tree is developed further at the occurrence of the corresponding TRANSFER OUT (e.g., on another page). TRANSFER OUT: Indicates that the portion of the tree must be attached at the corresponding TRANSFER IN. OR GATE: OR Gate indicates that one or more inputs event are required to produce the output event. AND GATE: AND Gate indicates that all inputs events are required to cause the output event. For a given Computer System fault tree, if events corresponding to the basic event set occurring cause top 13

event occurring, then the basic event set is called a cut set. Fault tree analysis begins with listing the minimal cut set. A cut set is a set of basic events whose occurrence causes the top event to occur. A minimal cut set is a cut set that would not remain a cut set if any of its basic events were removed [4]. To get the minimal cut set, a top-down algorithm, Fussell-vesely algorithm will be used [5]. Each minimal cut set consist of a combination of specific component failures, and hence the general n component minimal cut set can be expressed as = 1. 2. 3.. Where 1, 2, 3,,, are component failures on the tree. IV. FAULT TREE ANALYSIS OF COMPUTER SYSTEM The fault tree is constructed for the problem under study as shown in fig. 1. The top event for this problem is the computer not functioning. The computer functioning failure can be broadly classified into three parts: no power and no cold air failure and RAM damaged. No Power Computer Not Functioning RAM Not Working No Cold AIR The power failure or RAM damaged or cold air failures are the first level events which cause the system to fail. The no power can be attributed to two causes: a main out and UPS out. The logical relationship between these two intermediate events and no power event is represented using an AND gate. RAM not working can be attributed to three causes: dust in RAM and RAM damaged or RAM misplaced. The logical relationship between these three intermediate events is represented using an OR gate. The no cold air can be attributed to three causes: blower not working, no cooling water supply and compressor not functioning. The logical relationship between these three intermediate events and no cold air event is represented using an OR gate. The blower not working can be attributed to two causes: blower defective and no power to blower. The logical relationship between these two intermediate events and blower not working event is represented using an OR gate. The no power to blower can be attributed to two causes: mains out and genset out. The logical relationship between these two intermediate events and no power to blower event is represented using an AND gate. The compressor not functioning can be attributed to two causes: compressor out and no power. The logical relationship between these two intermediate events and compressor not functioning event is represented using an OR gate. No power can be attributed to two causes: mains out and genset out. The logical relationship between these two AND gate. A B C Fig. 1: Fault Tree The logical relation between these two intermediate events and the top event is represented using or gate. 14

6 : Blower Defective. 7 : Genset out of order. 8 : Compressor out of order. Following events used in this analysis is as follows: 1 : represents the event that there is no power supply to the computer system. 2 : represents the event that there is no power supply to the blower. 3 : represents the event that blower is not working. 4 : represents the event that there is no power supply to the compressor. 5 : represents the event that compressor is not working. 6 : represents the event that there is no cold air. 7 : represents the events that RAM is not working. Fig. 2: Fault Tree Model of Computer System V. QUALITATIVE EVALUATION OF A FAULT TREE The goal of qualitative analysis is to find out all the fault reasons i.e. all the minimal cut sets which is helpful in fault diagnosis or maintaining the computer system. For the qualitative analysis of computer system, the different 8 minimal cut set, due to which computer system may fail, are represented by 1,---- 8 which are as follows: 1 : The mains out. 2 : UPS out. 3 : Dust in RAM. 4 : RAM Damaged. 5 : RAM Misplaced. The above events take place only one or several minimal cut set occurs. Hence, the relation between events and minimal cut set can be given by 1 = 1 2 2 = 1 7 3 = 6 +( 1 7 ) 4 = 1 2 5 = 8 + ( 1 4 ) 6 = ( 6 +( 1 7 ))+ 8 + ( 1 4 )) 7 = 3 + 4 + 5 Let, main event computer not functioning is represented by F. Hence, F= ( 1 2 )+( 3 + 4 + 5 )+ (( 6 +( 1 7 ))+ ( 8 + ( 1 4 ))) Hence, the above relation shows that top event takes place only when one or several of them occur. VI. QUANTITATIVE CALCULATION OF PROBABILITY OF TOP EVENT The first step in the quantitative evaluation of a fault tree is to find the structural representation of the top event in the terms of the terms of the basic events as done in qualitatively analysis. If the rate of occurrence 15

and fault duration for all basic events are known, and the statistical dependency [6] of each basic event is known (or assumed), then the statistical expectation or probability of the top event can be determined. For the quantitative evaluation of fault tree of computer system the probability of failures of different components are taken as below: Sr. No. TABLE I Code Fault Probability of Failure 1 1 Mains Out 0.120 2 2 UPS Out 0.095 3 3 Dust in RAM 0.080 4 4 RAM Damaged 0.040 5 5 RAM 0.090 Misplaced 6 6 Blower 0.070 Defective 7 7 Genset Out 0.050 8 8 Compressor Out 0.860 Using minimal cut set and Boolean algebra [7], the expressions for evaluating the probability of top event of failure P(F) of computer system can be obtained as; P(F) = 1-1[ ( )] Where k is the number of minimal cut sets corresponds to the minimal cut set in and p( ) is the probability of minimal cut set. Hence, P(F)= 1- (1-1 2 )(1-3 )(1-4 )(1-5 )(1-6 )(1-1 7 ) (1-8 )(1-1 4 ) = 1-.10154 =.89766 Hence, the fault tree analysis express the inter relation between probability of occurring of fault and minimal cut sets and provides a way to study the root cause of the faults which can be used to improve the reliability [8-11] of computer system. VII IMPORTANT DEGREE OF PROBABILITY The change ratio between the probability of top event s and the probability of basic event Xi s (i=1~8) is called the basic event s important degree of probability [12]. ( ) = VIII KEY IMPORTANCE Key importance is change ratio of divided by that of. It balances the importance degree of every basic event with sensitivity and their fault probability. The larger value of key importance is, the top event is more likely to happen [10], thus it is easy to remove some certain faults quickly and take measurements in case of further malfunction or latent danger if key importance is ordered from largest to smallest. Key importance is calculated with formula: (i)= ( ) TABLE II Basic Important degree of Key importance Event probability E1.01827.00244 E2.01243.00132 E3.11123.00991 E4.10711.00477 E5.11246.01128 E6.11004.00858 E7.01235.00069 E8.73097.70031 From table we can conclude: (8) > (5)> (3) > (6)> (4)> (1)> (2)= (7) (8) > (5)> (3) > (6)> (4)> (1)> (2)> (7) The larger (i) is, the more likely top event is to take place, to reduce the probability of top event,the happening probability of basic event 8,5,3,6,4,1,2,7 should be cut down as much as possible. Considering 16

the significance of key importance, first of all, we should suspect and check the emergence of basic event 8,5,3,6,4,1,2,7 and then carry out the fault diagnosis or adjust control strategy on the basis of the value of (i). IX. CONCLUSION Fault tree analysis has been applied to predict the failure probability or failure frequency of computer system in terms of the failure and repair parameters of the system components. A qualitative analysis has been proposed regarding the safety of the system and the identification of critical system elements if the system to be upgraded. Qualitative analysis has been carried out to find all the fault reasons which has been represented by minimal cut sets which is helpful in fault diagnosis or maintaining the computer system. In the work, key importance the statistical expectation or probability of the top event has been determined using quantitative analysis. This express the inter relation between probability of occurring of fault and minimal cut sets and provides a way to study the root cause of the faults which can be used to improve the reliability of the test systems. In the computer system, it is found using quantitative analysis that maximum value of key importance is of compressor. So, to improve the reliability of the computer system the basic event E8 i.e. compressor must be diagnosis firstly to reduce the probability of the fault. X. REFERENCES vol. 20,no.1, pp.5-10. Jan. 1999 [6] Zhang Li., Deng Zhi-liang, Human errors in complex man-machine systems, China safety science journal, Transaction on Reliability, vol, 6, Jun1996, pp 35-37. [7] Hennings, W. and. Kuznetsov, N. Yu, FAMOCUTN and CUTQN Computer codes for fast analytical evaluation of large fault trees with replicated and negated gates, IEEE Trans. Reliability, 1995, pp. 368 376. [8] Masdi Muhammad, M. Amin Abd Majid, A Case Study of Reliability Assessment for Centrifugal Pumps in a Petrochemical Plant, Fourth world congress on engineering asset management Athens, Greece, 28-30 September 2009. [9] ZHENG Yanyan XU Renzuo State Key Laboratory of Software Engineering of Wuhan University, Wuhan, China A Human Factors Fault Tree Analysis Method for Software Engineering, IEEE,2008 [10] Guoyin Liu, Shaofen Lin, Xiaoxia Jiang, Qinglin Chen, Shaohui Liu, Stimulation Analysis to Marine Crane Hydraulic System reliability Based on Fault Tree, Ship & Ocean Engineering, 2008,37,(2):71-72. [11] Schneeweiss W. G., The Fault Tree Method (in the Fields of Reliability and Safety Technology), Lilole-Verlag, Berlin 2000. [12] Rui Quan, Baohua Tan, Shuhai Quan, Study on Fault Tree Analysis of Fuel Cell Stack Malfunction, International Conference on Measuring Technology and Mechatronics Automation 2010. [1] Cao Jinhua, cheng Kan, Introduction of Reliability Mathematics, higher education press, 2006. [2] H.A. Watson and Bell Telephone Laboratories, Launch Control Safety Study, Bell Telephone Laboratories, Murray Hill, NJ USA, 1961. [3] Dugan J B, Coppit D. developing a low- cost highquality software tool for dynamic fault tree analysis,ieee Transaction on Reliability, 2000, 49(1): 49-59. [4] Chen Kai, Lu Shulan, Li Fengling, reliability mathematics,jilin education press, 1989 [5] Xu Renzuo, Reliability allocation for modular software system designed for multiple customers, 17