"Reliability and MTBF Overview"



Similar documents
Chapter 8: Reliability and Availability

Calculating Reliability using FIT & MTTF: Arrhenius HTOL Model

Avoiding AC Capacitor Failures in Large UPS Systems

Introduction to Basic Reliability Statistics. James Wheeler, CMRP

Technical Note Uprating Semiconductors for High-Temperature Applications

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Copyright Tyco Electronics Corporation Page 1 of 7

# 33. Estimating Laser Diode Lifetimes and Activation Energy

Issue 1.0 Jan A Short Guide to Quality and Reliability Issues In Semiconductors For High Rel. Applications

Unit 18: Accelerated Test Models

Power Semiconductor Reliability Handbook

CHAPTER 14 STEP STRESS TESTING (SST) CONTENTS

Preliminary Evaluation of Data Retention Characteristics for Ferroelectric Random Access Memories (FRAMs).

DfRSoft 2-Day Course Shock & Vibration: Test, Design, and Design Assurance

Gamma Distribution Fitting

Technical Note SolarEdge Reliability Methodology

LCD-TV RELIABILITY TESTING AN EFFECTIVE APPROACH

Life Data Analysis using the Weibull distribution

MTBF, MTTR, MTTF & FIT Explanation of Terms

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

Design for Reliability & Quality Class From DfRSoft...

Capacitors Age and Capacitors Have an End of Life. A White Paper from the Experts in Business-Critical Continuity TM

Sizing Circuit Breakers for Self-Regulating Heat Tracers

AP Physics 1 and 2 Lab Investigations

However, industrial applications may utilize a relay, which short-circuits the ICL path after the inrush sequence.

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

MAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION

An Introduction to Reliability testing

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Opportunities for HALT/HASS in designing robust electronics

MicroNote TM Calculating Chi-squared (X 2 ) for Reliability Equations

Product Excellence using 6 Sigma (PEUSS)

Certified Reliability Engineer

Indiana's Academic Standards 2010 ICP Indiana's Academic Standards 2016 ICP. map) that describe the relationship acceleration, velocity and distance.

Technical Note. NOR Flash Cycling Endurance and Data Retention. Introduction. TN-12-30: NOR Flash Cycling Endurance and Data Retention.

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

Characterizing Digital Cameras with the Photon Transfer Curve

ATC ATC. Commercial Off The Shelf. Commercial Off-the-Shelf (COTS) High Reliability Certification Program. Applications:

HOW TO SELECT VARISTORS

Long term performance of polymers

LED Luminaires. White paper. Evaluating performance of. LED based. luminaires

Fairfield Public Schools

VI Chip Environmental Qualification Testing Standards

WHITE PAPER. HALT and HASS in IPC 9592A. HALT - Highly Accelerated Life Test HASS - Highly Accelerated Stress Screen

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

Cable Failure Statistics and Analysis at TXU Electric Delivery Company. by Richie Harp TXU Electric Delivery and John T. Smith, III General Cable

Failure Modes, Effects and Diagnostic Analysis

Pitfalls to Avoid in HALT and HASS Gregg K. Hobbs, Ph.D., P.E.

Cree XLamp Long-Term Lumen Maintenance

Normality Testing in Excel

Reliability of Air Moving Fans, and Their Impact on System Reliability. A White Paper from the Experts in Business-Critical Continuity TM

White Paper. Calculate Reliable LED Lifetime Performance in Optocouplers. Introduction. LED Reliability Stress Tests.

Application Note Noise Frequently Asked Questions

DEDICATED TO EMBEDDED SOLUTIONS

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Figure 1. Diode circuit model

PAPER. Fundamentals of HALT/HASS Testing. HALT/HASS Applications

GUIDANCE FOR ASSESSING THE LIKELIHOOD THAT A SYSTEM WILL DEMONSTRATE ITS RELIABILITY REQUIREMENT DURING INITIAL OPERATIONAL TEST.

COMPETENCY GOAL 1: The learner will develop abilities necessary to do and understand scientific inquiry.

Type MLP 85 C Flatpack, Ultra Long Life, Aluminum Electrolytic Very Low Profile

Math 108 Exam 3 Solutions Spring 00

Methods to predict fatigue in CubeSat structures and mechanisms

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

HYPOTHESIS TESTING WITH SPSS:

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

SEQUENCES ARITHMETIC SEQUENCES. Examples

13.10: How Series and Parallel Circuits Differ pg. 571

Plastic Encapsulant Impact on Au/Al Ball Bond Intermetallic Life

Selecting IHLP Composite Inductors for Non-Isolated Converters Utilizing Vishay s Application Sheet

START Selected Topics in Assurance

Using simulation to calculate the NPV of a project

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

BASIC ELECTRONICS AC CIRCUIT ANALYSIS. December 2011

TURBINE ENGINE LIFE MANAGEMENT App. N AIAA AIRCRAFT ENGINE DESIGN

POWER AND VOLTAGE RATING

Real-time PCR: Understanding C t

Managing Aged Transformers

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

3 More on Accumulation and Discount Functions

Evaluation of the Surface State Using Charge Pumping Methods

Chapter 1 Approach to Quality Assurance

CALCULATIONS & STATISTICS

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Availability and Cost Monitoring in Datacenters. Using Mean Cumulative Functions

ENERGY TRANSFER SYSTEMS AND THEIR DYNAMIC ANALYSIS

2.500 Threshold e Threshold. Exponential phase. Cycle Number

Important Probability Distributions OPRE 6301

Technical Information Short-Circuit Currents Information on short-circuit currents of SMA PV inverters

Chapter 7 Section 7.1: Inference for the Mean of a Population

Environment-Laboratory Ambient Conditions

The Basics of Predictive / Preventive Maintenance

SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID

Clovis Community College Core Competencies Assessment Area II: Mathematics Algebra

Using NTC Temperature Sensors Integrated into Power Modules

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Transcription:

"Reliability and MTBF Overview" Prepared by Scott Speaks Vicor Reliability Engineering

Introduction Reliability is defined as the probability that a device will perform its required function under stated conditions for a specific period of time. Predicting with some degree of confidence is very dependant on correctly defining a number of parameters. For instance, choosing the distribution that matches the data is of primary importance. If a correct distribution is not chosen, the results will not be reliable. The confidence, which depends on the sample size, must be adequate to make correct decisions. Individual component failure rates must be based on a large enough population and relevant to truly reflect present day normal usages. There are empirical considerations, such as determining the slope of the failure rate and calculating the activation energy, as well as environmental factors, such as temperature, humidity, and vibration. Lastly, there are electrical stressors such as voltage and current. Reliability engineering can be somewhat abstract in that it involves much statistics; yet it is engineering in its most practical form. Will the design perform its intended mission? Product reliability is seen as a testament to the robustness of the design as well as the integrity of the quality and manufacturing commitments of an organization. This paper explains the basic concepts of reliability as it applies to power supplies manufactured by Vicor. 2 of 10

The Bathtub Curve The life of a population of units can be divided into three distinct periods. Figure 1 shows the reliability bathtub curve which models the cradle to grave instantaneous failure rates vs. time. If we follow the slope from the start to where it begins to flatten out this can be considered the first period. The first period is characterized by a decreasing failure rate. It is what occurs during the early life of a population of units. The weaker units die off leaving a population that is more rigorous. This first period is also called infant mortality period. The next period is the flat portion of the graph. It is called the normal life. Failures occur more in a random sequence during this time. It is difficult to predict which failure mode will manifest, but the rate of failures is predictable. Notice the constant slope. The third period begins at the point where the slope begins to increase and extends to the end of the graph. This is what happens when units become old and begin to fail at an increasing rate. Figure 1. Reliability Bathtub Curve Early Life Period Vicor uses many methods to ensure the integrity of design. Some of the design techniques include: burn-in (to stress devices under constant operating conditions); power cycling (to stress devices under the surges of turn-on and turn-off); temperature cycling (to mechanically and electrically stress devices over the temperature extremes); vibration; testing at the thermal destruct limits; highly accelerated stress and life testing; etc. Vicor is well aware that in spite of using all these design tools, as well as manufacturing tools such as six sigma and quality improvement techniques, there will still be some early 3 of 10

failures due to the inability to control processes at the molecular level. There is always the risk that, although the most up to date techniques are used in design and manufacture, early failures will occur. In order to offset these risks especially in newer product Vicor consumes some of the early useful life of a module via stress screening. This technique allows the units to begin their operating life somewhere closer to the flat portion of the bathtub curve instead of at the initial peak, which represents the highest risk of failure. Operating life is consumed via burn in and temperature cycling. The amount of screening needed for acceptable quality is a function of the process grade as well as history. M-Grade modules are screened more than I-Grade modules, and I-Grade modules are screened more than C-Grade units. Useful Life Period As the product matures, the weaker units die off, the failure rate becomes nearly constant, and modules have entered what is considered the normal life period. This period is characterized by a relatively constant failure rate. The length of this period is referred to as the system life of a product or component. It is during this period of time that the lowest failure rate occurs. Notice how the amplitude on the bathtub curve is at its lowest during this time. The useful life period is the most common time frame for making reliability predictions. The failure rates calculated from MIL-HDBK-217 and Telcordia- 332 apply to this period and to this period only. Wearout Period As components begin to fatigue or wearout, failures occur at increasing rates. Wearout in power supplies is usually caused by the breakdown of electrical components that are subject to physical wear and electrical and thermal stress. It is this area of the graph that the MTBFs or FIT rates calculated in the useful life period no longer apply. A product with a MTBF of 10 years can still exhibit wearout in two years. No parts count method can predict the time to wearout of components. Electronics in general, and Vicor power supplies in particular, are designed so that the useful life extends past the design life. This way wearout should never occur during the useful life of a module. For instance, most electronics are obsolete within 20 years; Vicor module MTBFs may extend 35 years or longer. Weibull Analysis Weibull Analysis can be used as a method of determining where a population of modules is on the bathtub curve. The Weibull distribution is a 3-parameter distribution. The three parameters that make up the Weibull distribution are,, and time. The Weibull distribution is given by: 4 of 10

ƒ(t) = _ _ 1 e _ The Weibull parameter (beta) is the slope. It signifies the rate of failure. When < 1, the Weibull distribution models early failures of parts. When = 1, the Weibull distribution models the exponential distribution. The exponential distribution is the model for the useful life period, signifying that random failures are occurring. When = 3, the Weibull distribution models the normal distribution. This is the early wearout time. When = 10, rapid wearout is occurring. Vicor uses Weibull analysis extensively because this distribution allows modeling to be done with a minimal amount of failures. As Dr. Abernathy said in his book The New Weibull Handbook, Use of the Weibull distribution provides accurate failure analysis and risk predictions with extremely small samples using a simple and useful graphical plot. Solutions are possible at the earliest stage of a problem without the requirement to crash a few more. 1 The results of various s on the Weibull analysis can be seen in the graph below: 0.7 0.6 0.5 0.4 0.3 0.2 Beta = 0.5 Beta = 1 Beta = 3 Beta = 10 Combined 0.1 0 Notice how, if all curves are combined, the resultant graph is similar to a bathtub curve. Accelerated Life Testing Accelerated life testing employs a variety of high stress test methods that shorten the life of a product or quicken the degradation of the product s performance. The goal of such testing is to efficiently obtain performance data that, when properly analyzed, yields reasonable estimates of the product s life or performance under normal conditions. 2 Vicor uses a variety of high stress test methods that shorten the life of a product and/or quicken the degradation of product performance. This induces early failures that would sometimes manifest themselves in the early years of a product s life, and also allows issued related to design tolerances to be discovered before volume manufacturing. Both 5 of 10

the type of stressor and the time under test are used to determine the normal lifetime. There are various stressors including, but not limited to, heat, humidity, temperature, vibration, and load. The effect of these stressors can be mathematically determined. Vicor uses accelerated testing on all new products. The equation below is used to model acceleration due to temperature and is referred to as the Arrhenius equation. The Arrhenius equation relates how increased temperature accelerates the age of a product as compared to its normal operating temperature. A f = e Ea k 1 T u 1 T t A f = acceleration factor Ea = activation energy in electron-volts (ev) k = Boltzmann s constant (k = 8.617 x 10-5 ev/tk) Tk = Kelvin T u = reference junction temperature, in degrees Kelvin (K = C + 273) T t = junction temperature during test, in degrees Kelvin e = 2.71828 (base of the natural logarithms) Humidity is also a stressor. The equation below (Hallberg Peck) models the effect of temperature and humidity combined on product life. A f = RH t RH u 3 e Ea K 1 T u 1 T t RH u = use environment relative humidity RHt = test environment relative humidity The stressor that has the most profound effect on product life is thermal cycling. The acceleration factor due to thermal cycling is given by the Coffin-Manson equation below. Where: 1.9 1/3 T l F f Ea 1 1 A F = T f F l e K T max f T max l T l = lab temperature difference between highest and lowest operating temperature. T f = field temperature difference between on and off state F f = cycle frequency in the field (cycles/24hours). Minimum number of six. F l = cycle frequency in the lab. Minimum number of six because most failures occur in the first four hours. 6 of 10

Activation energy is derived from empirical data gathered during accelerated testing and is the slope of the failure rate at two different stressors. Activation energy represents the effect that the applied stress will have on the product under test the stress factor being heat, voltage, current, or vibration. Large activation energy indicates that the applied stress will have a large effect on the life of the product. The activation energies that Vicor uses performing parts count calculations are based on MIL-HDBK-217F. Mean Time Between Failures (MTBF) Reliability is quantified as MTBF (Mean Time Between Failures) for repairable product and MTTF (Mean Time To Failure) for non-repairable product. A correct understanding of MTBF is important. A power supply with an MTBF of 40,000 hours does not mean that the power supply should last for an average of 40,000 hours. According to the theory behind the statistics of confidence intervals, the statistical average becomes the true average as the number of samples increase. An MTBF of 40,000 hours, or 1 year for 1 module, becomes 40,000/2 for two modules and 40,000/4 for four modules. Sometimes failure rates are measured in percent failed per million hours of operation instead of MTBF. The FIT is equivalent to one failure per billion device hours, which is equivalent to a MTBF of 1,000,000,000 hours. The formula for calculating the MTBF is = T/R. = MTBF T = total time R = number of failures MTTF is stands for Mean Time To Failure. To distinguish between the two, the concept of suspensions must first be understood. In reliability calculations, a suspension occurs when a destructive test or observation has been completed without observing a failure. MTBF calculations do not consider suspensions whereas MTTF does. MTTF is the number of total hours of service of all devices divided by the number of devices. It is only when all the parts fail with the same failure mode that MTBF converges to MTTF. = T/N = MTTF T = total time N = Number of units under test. Example: Suppose 10 devices are tested for 500 hours. During the test 2 failures occur. The estimate of the MTBF is: = 10*500 = 2,500 hours / failure. 2 7 of 10

Whereas for MTTF = 10*500 = 500 hours / failure. 10 If the MTBF is known, one can calculate the failure rate as the inverse of the MTBF. The formula for ( ) is: = 1 = r T where r is the number of failures. Once a MTBF is calculated, what is the probability that any one particular Vicor module will be operational at time equal to the MTBF? We have the following equation: R(t) = e -t/mtbf. But when t = MTBF R(t) = e -1 = 0.3677 This tells us that the probability that any one particular module will survive to its calculated MTBF is only 36.8%. Reliability Predictions Methods A lot of time has been spent on developing procedures for estimating reliability of electronic equipment. There are generally two categories: (1) predictions based on individual failure rates, and (2) demonstrated reliability based on operation of equipment over time. Prediction methods are based on component data from a variety of sources: failure analysis, life test data, and device physics. For some calculations (e.g. military application) MIL-HDBK-217 is used, which is considered to be the standard reliability prediction method. Vicor also performs calculations using Telcordia Technical Reference TR-332 Reliability Prediction Procedure for Electronic Equipment. Both of these prediction methods have several assumptions in common, e.g. constant failure rate, the use of thermal and stress acceleration factors, quality factors, use conditions. The table below highlights the difference in the two methods of calculation. Telcordia MIL-HDBK-217 Recognition Popularity growing internationally, Known and accepted around the 8 of 10

and acceptance Concentration Calculation Test Data Multiplier Parts Environment Quality Levels but has not yet been completely embraced by international community Designed to focus on telecommunications Generally failure rates will be seen as more positive. Require fewer parameters for components Includes capability to consider burnin data, laboratory test data, and field data. Calculates failure rate in FITs (or failures in time). Provides models for gyroscopes, batteries, heaters, coolers, and computer systems. Supports limited number of environments 4 standard quality levels for all component types. world. Designed for both military and commercial. More parameters that vary widely for specific components. Includes voltage and power stresses. Calculates MTBF. Provides models for printed circuit boards, lasers, SAWS, magnetic bubble memories and tubes Supports most types of environments. Quality levels are more component specific. Knowing that there is a difference is important. The customer can determine what method to use as per the application. Numbers obtained, as will be seen in the next section, vary widely in most instances. Reliability Demonstration Methods When predicting and demonstrating the reliability of a product, the exact failure time can never be know. Not withstanding this fact, we are required to hypothesize regarding our confidence in the integrity of our numbers. The Chi 2 distribution is often used in this case because it is a probability distribution that relates the observed to the expected values. The relationship between failure rate and the Chi 2 distribution is as follows: 2 (, ) = (2 x time * A F ) Where = ((100 CL)/100 CL (Confidence Level) = 60%, 90%. = Degrees of freedom = 2 * number of failures + 2 A F = Acceleration factor, as determined by one of the methods discussed earlier When estimating failure rates, Vicor uses acceleration factors to determine use failure rates based on lab failure rates. 9 of 10

Conclusion This paper has summarized Vicor s view on power supply reliability. We hope it has provided some insight into the details behind calculated MTBF numbers. Reliability numbers can vary greatly between suppliers and only by understanding the assumptions and methodology used, can meaningful comparisons be made. In summary, when properly applied and operated in accordance with Vicor s Applications Manual, Vicor Power Components are some of the most reliable products available. References 1. R. B. Abernethy, The New Weibull Handbook 4 th edition, Abernethy 2000 2. Reliability Analysis Center, Reliability Toolkit: Commercial Practices Editions, Rome Laboratory 10 of 10