Basics of Traditional Reliability

Where we are going Basic Definitions Life and ties of a Fault Reliability Models N-Modular redundant systes

Definitions RELIABILITY: SURVIVAL PROBABILITY When repair is costly or function is critical AVAILABILITY: THE FRACTION OF TIME A SYSTEM MEETS ITS SPECIFICATION When service can be delayed or denied REDUNDANCY: EXTRA HARDWARE, SOFTWARE, TIME FAILSAFE: SYSTEM FAILS TO A KNOWN SAFE STATE i.e. All red traffic signals

Stages in Syste Developent STAGE ERROR SOURCES ERROR DETECTION Specification Algorith Design Siulation & design Foral Specification Consistency checks Prototype Algorith design Stiulus/response Wiring & assebly Testing Tiing Coponent Failure Manufacture Wiring & assebly Syste testing Coponent failure Diagnostics Installation Assebly Syste Testing Coponent failure Diagnostics Field Operation Coponent failure Diagnostics Operator errors Environental factors

Cause-Effect Sequence and Duration FAILURE: FAULT: ERROR: coponent does not provide service a defect within a syste a deviation fro the required operation of the syste or subsyste (anifestation of a fault) DURATION: Transient- Interittent- Peranent- design errors, environent repair by replaceent repair by replaceent

Basic Steps in Fault Handling Fault Confineent Fault Detection Fault Masking Retry Diagnosis Reconfiguration Recovery Restart Repair Reintegration

MTBF -- MTTD -- MTTR Availability = MTBF MTBF + MTTR

First predictive reliability odels - Von Braun Wernher Von Braun - Geran Rocket Engineer, WWII V1 was 100% Unreliable Fixed weakest link - still unreliable Eric Pieruschka - Geran Matheatician 1/x^n - for identical coponents Rs=R1 x R2 x x Rn (Lusser s law)

Serial Reliability N R(t)= Π R i (t) i =1 Thus building a serially reliable syste is extraordinarily difficult and expensive. For exaple, if one were to build a serial syste with 100 coponents each of which had a reliability of.999, the overall syste reliability would be 0.999 100 = 0.905

Reliability of a syste of coponents 1 3 4 2 5 Φ(x)= 1, functioning when state vector x {0, failed when state vector x Φ(x)= ax(x 1,x 2 )ax(x 3 x 4,x 5 ) Minial path set: inial set of coponents whose functioning ensures the functioning of the syste {1,3,4} {2,3,4} {1,5} {2,5}

Parallel Reliability N R(t)= 1-Π [1-R i (t)] i =1 Consider a syste built with 4 identical odules which will operate correctly provided at least one odule is operational. If the reliability of each odule is.95, then the overall syste reliability is: 1-[1-.95] 4 = 0.99999375 In this way we can build reliable systes fro coponents that are less than perfectly reliable - for a cost.

Parallel - Serial reliability 1 3 4 2 5 Total reliability is the reliability of the first half, in serial with the second half. Given that R1=.9, R2=.9, R3=.99, R4=.99, R5=.87 Rt=[1-(1-.9)(1-.9)][1-(1-.87)(1-(.99.99))] =.987

Coponent Reliability Model But It isn t quite so straight forward... During useful life coponents exhibit a constant failure rate λ. Accordingly, the reliability of a device can be odeled using an exponential distribution. R(t) = e -λt

N-Modular redundant systes Redundant syste ipleentations typically use a voting ethod to deterine which outputs are correct. This voting overhead eans that true parallel odule reliability is typically only approached R = R M. of. N 3. of.5 R 5 ( t) ( t) + = 0.9988 ( t) 2 = i= 0 = 10(0.95) = 5R 3 N M i= 0 5! ( ) R (5 i)! i! 4 ( t)[1 15(0.95) N! ( ) R ( N i)! i! R 4 5 i ( t)] + + ( t)[1 10R 6(0.95) 3 5 N i R ( t)] ( t)[1 ( t)[1 5 R ( t)] R 2 ( t)] Consider a 5 odule syste requiring 3 correct odules, each with a reliability of 0.95 (exaple 7.9). i

Conclusions The coon techniques for fault handling are fault avoidance, fault detection, asking redundancy, and dynaic redundancy. Any reliable syste will have its failure response carefully built into it, as soe copleentary set of actions and responses. Syste reliability can be odeled at a coponent level, assuing the failure rate is constant (exponential distribution). Reliability ust be built into the project fro the start.