Table of Contents THE USE OF WEIBULL IN DEFECT DATA ANALYSIS INTRODUCTION............................. 1 Information Sources.......................... 1 Application to Sampled Defect Data................. 1 DATA.................................... 2 Quality of Data............................ 2 Quantity of Data........................... 3 THE MECHANICS OF WEIBULL ANALYSIS.............. 4 The Value of Analysis......................... 4 Evaluating the Weibull Parameters.................. 5 INTERPRETATION OF WEIBULL OUTPUT.............. 7 Concept of Hazard.......................... 7 Scale Parameter or Characteristic LIfe................ 10 Location Parameter or Minimum Life................ 11 PRACTICAL DIFFICULTIES WITH WEIBULL PLOTTING....... 13 Scatter................................. 13 Extrapolation............................. 13 Multi-Modal Failures......................... 13 Confidence Limits........................... 14 Censoring of Sample Data....................... 15 COMPARISON WITH HAZARD PLOTTING.............. 16 CONCLUSIONS.............................. 17 TWO CYCLE WEIBULL PAPER.................. 18 PROGRESSIVE EXAMPLE OF WEIBULL PLOTTING...... 20 ESTIMATION OF WEIBULL LOCATION PARAMETER.... 31
EXAMPLE OF A 3-PARAMETER WEIBULL PLOT.......... 32 THE EFFECT OF SCATTER...................... 37 95% CONFIDENCE LIMITS FOR WEIBULL............. 39 WEIBULL PLOT OF MULTIPLY-CENSORED DATA........ 42
THE USE OF WEIBULL IN DEFECT DATA ANALYSIS 1 INTRODUCTION These notes give a brief introduction to Weibull analysis and its potential contribution to equipment maintenance and lifing policies. Statistical terminology has been avoided wherever possible and those terms which are used are explained, albeit briefly. Weibull analysis originated from a paper, Reference 1, published in 1951 by a Swedish mechanical engineer, Professor Waloddi Weibull. His original paper did little more than propose a multi-parameter distribution, but it became widely appreciated and was shown by Pratt and Whitney in 1967 to have some application to the analysis of defect data. 1.1 Information Sources The definitive statistical text on Weibull is cited at Reference 2, and publications closer to the working level are given at Reference 3 and 4. A set of British Standards, BS 5760 Parts 1 to 3 covering a broad spectrum of reliability activities are being issued. Part 1 on Reliability Programme Management was issued in 1979 but is of little value here except for its comments on the difficulties of obtaining adequate data. Part 2, Reference 5, contains valuable guidance for the application of Weibull analysis although this may be difficult to extract. The third of the Standard contains authentic practical examples illustrating the principles established in Parts 1 and 2. One further source of information is an I Mech E paper by Sherwin and Lees at Reference 6. Part 1 of this paper is a good review of current Weibull theory and Part 2 provides some insight into the practical problems inherent in its use. 1.2 Application to Sampled Defect Data It is important to define the context in which the following Weibull analysis may be used. All that is stated subsequently is applicable to sampled defect data. This is a very different situation to that which exists on, say, the RB-211 for which Rolls Royce has a complete data base. They know at any time the life distribution of all the in-service engines and their components, and their analysis can be done from a knowledge of the utilizations at failure and the current utilisation for all the non-failed components. Their form of Weibull analysis is unique to this situation of total visibility. It is assumed here, however, that most organisations are not in this fortunate position; their data will at best be of some representative sample of the failures which have occurred, and of utilization of unfailed units. It is stressed too highly, though, that Warwick Manufacturing Group 1
life of unfailed units must be known if a realistic estimate of lifetimes to failure is to be made, and, therefore, data must be collected on unfailed units in the sample. 2 DATA The basic elements in defect data analysis comprise: a population, from which some sample is taken in the form of times to failure (here time is taken to mean any appropriate measure of utilisation), an analytical technique such as Weibull which is then applied to the sample of failure data to derive a mathematical model for the behaviour of the sample, and hopefully of the population also, and finally some deductions which are generated by an examination of the model. These deductions will influence the decisions to be made about the maintenance strategy for the population. The most difficult part of this process is the acquisition of trustworthy data. No amount of elegance in the statistical treatment of the data will enable sound judgements to be made from invalid data. Weibull analysis requires times to failure. This is higher quality data than a knowledge of the number of failures in an interval. A failure must be a defined event and preferably objective rather than some subjectively assessed degradation in performance. A typical sample, therefore, might at its most superficial level comprise a collection of individual times to failure for the equipment under investigation. 2.1 Quality of Data The quality of data is a most difficult feature to assess and yet its importance cannot be overstated. When there is a choice between a relatively large amount of dubious data and a relatively small amount of sound data, the latter is always preferred. The quality problem has several facets: The data should be a statistically random sample of the population. Exactly what this means in terms of the hardware will differ in each case. Clearly the modification state of equipments may be relevant to the failures being experienced and failure data which cannot be allocated to one or other modification is likely to be misleading. By an examination of the source of the data the user must satisfy himself that it contains no bias, or else recognise such a bias and confine the deductions accordingly. For example, data obtained from one user unit for an item experiencing failures of a nature which may be influenced by the quality of maintenance, local operating conditions/practices or any other idiosyncrasy of that unit may be used providing the conclusions drawn are suitably confined to the unit concerned. Warwick Manufacturing Group 2
A less obvious data quality problem concerns the measure of utilisation to be used; it must not only be the appropriate one for the equipment as a whole, but it must also be appropriate for the major failure modes. As will be seen later, an analysis at equipment level can be totally misleading if there are several significant failure modes each exhibiting their own type of behaviour. The view of the problem at equipment level may give a misleading indication of the counter-strategies to be employed. The more meaningful deeper examination will not be possible unless the data contains mode information at the right depth and degree of integrity. It is necessary to know any other details which may have a bearing on the failure sensitivity of the equipment; for example the installed position of the failures which comprise the sample. There are many factors which may render elements of a sample unrepresentative including such things as misuse or incorrect diagnosis. 2.2 Quantity of Data Whereas the effects of poor quality are insidious, the effects of inadequate quantity of data are more apparent and can, in part, be countered. To see how this may be done it is necessary to examine one of the statistical characteristics used in Weibull analysis. An equipment undergoing in-service failures will exhibit a cumulative distribution function (F(t)), which is the distribution in time of the cumulative failure pattern or cumulative percent failed as a function of time, as indicated by the sample. Consider a sample of 5 failures (sample size n = 5). The symbol i is used to indicate the failure number once the failure times are ranked in ascending order; so here i will take the integer values 1 to 5 inclusive. Suppose the 5 failure times are 2, 7, 13, 19 and 27 cycles. Now the first failure at 2 cycles may be thought to correspond to an F(t) value of i/n, wherei=1andn=5. ie F(t) @ 2 cycles = 1/5 or 0.2 or 20% Similarly for the second failure time of 7 cycles, the corresponding F(t) is 40% and so on. On this basis, this data is suggesting that the fifth failure at 27 cycles corresponds to a cumulative percent failed of 100%. In other words, on the basis of this sample, 100% of the population will fail by 27 cycles. Clearly this is unrealistic. A further sample of 10 items may contain one or more which exceed a 27 cycle life. A much larger sample of 1000 items may well indicate that rather than correspond to a 100% cumulative failure, 27 cycles corresponds to some lesser cumulative failure of any 85 or 90%. This problem of small sample bias is best overcome as follows: Sample Size Less Than 50. A table of Median Ranks has ben calculated which gives a best estimate of the F(t) value corresponding to each failure time in the sample. This table is issued with these notes. It indicates that in the example just considered, the F(t) values corresponding to the 5 ascending failure times quoted are not 20%, 40%, 60%, 80% and 100%, but are 12.9%, 31.4%, 50%, 68.6% and 87.1%. It is this latter set of F(t) use values which should be plotted against the corresponding ranked failure times on a Weibull plot. Median rank values give the best estimate for the primary Weibull parameter and are best suited to some later work on confidence limits. Warwick Manufacturing Group 3
Sample Size Less Than 100. For sample sizes less than 100, in the absence of Median Rank tables the true median rank values can be adequately approximated using Bernard s Approximation: F(t) = (i - 0.3)/(n + 0.4) Sample Sizes Greater Than 100. Above a sample size of about 100 the problem of small sample bias is insignificant and the F(t) values may be calculated from the expression for the Mean Ranks: i/(n + 1) 3 THE MECHANICS OF WEIBULL ANALYSIS 3.1 The Value of Analysis On occasions, an analysis of the data reveals little that was not apparent from engineering judgement applied to the physics of the failures and an examination of the raw data. However, on other occasions, the true behaviour of equipments can be obscured when viewed by the most experienced assessor. It is always necessary to keep a balance between deductions drawn from data analysis and those which arise from an examination of the mechanics of failure. Ideally, these should be suggesting complementary rather than conflicting counter-strategies to unreliability. There are many reliability characteristics of an item which may be of interest and significantly more reliability measures or parameters which can be used to describe those characteristics. Weibull will provide meaningful information on 2 such characteristics. First, it will give some measure of how failures are distributed with time. Second, it will indicate the hazard regime for the failures under consideration. The significance of these 2 measures of reliability is described later. Weibull is a 3-parameter distribution which has the great strength of being sufficiently flexible to encompass almost all the failure distributions found in practice, and hence provide information on the 3 failure regimes normally encountered. Weibull analysis is primarily a graphical technique although it can be done analytically. The danger in the analytical approach is that it takes away the picture and replaces it with apparent precision in terms of the evaluated parameters. However, this is generally considered to be a poor practice since it eliminates the judgement and experience of the plotter. Weibull plots are often used to provide a broad feel for the nature of the failures; this is why, to some extent, it is a nonsense to worry about errors of about 1% when using Bernard s approximation, when the process of plotting the points and fitting the best straight line will probably involve significantly larger errors. However, the aim is to appreciate in broad terms how the equipment is behaving. Weibull can make such profound statements about an equipment s behaviour that ±5% may be relatively trivial. Warwick Manufacturing Group 4
3.2 Evaluating the Weibull Parameters The first stage of Weibull analysis once the data has been obtained is the estimation of the 3 Weibull parameters: : Shape parameter. : Scale parameter or characteristic life. : Location parameter or minimum life. The general expression for the Weibull F(t) is: F t 1 e t This can be transformed into: log log 1 1 F t log t log It follows that if F(t) can be plotted against t (corresponding failure times) on paper which has a reciprocal double log scale on one axis and a log scale on the other, and that data forms a straight line, then the data can be modelled by Weibull and the parameters extracted from the plot. A piece of 2 cycle Weibull paper (Chartwell Graph Data Ref C6572) is shown at Annex A and this is simply a piece of graph paper constructed such that its vertical scale is a double log reciprocal and its horizontal scale is a conventional log. The mechanics of the plot are described progressively using the following example and the associated illustrations in plots 1 to 12 of Annex B. Consider the following times to failure for a sample of 10 items: 410, 1050, 825, 300, 660, 900, 500, 1200, 750 and 600 hours. Assemble the data in ascending order and tabulate it against the corresponding F(t) values for a sample size of 10, obtained from the Median Rank tables. The tabulation is shown at table 1 (Annex B). Mark the appropriate time scale on the horizontal axis on a piece of Weibull paper (plot 2). Plot on the Weibull paper the ranked hours at failure (t i ) on the horizontal axis against the corresponding F(t) value on the vertical axis (plot 3). If the points constitute a reasonable straight line then construct that line. Note that real data frequently snakes about the straight line due to scatter in the data; this is not a problem providing the snaking motion is clearly to either side of the line. When determining the position of the line give more weight to the later points rather than the early ones; this is necessary both because of the effects of cumulation and because the Weibull paper tends to give a disproportionate emphasis to the early points which should be countered where these are at variance with the subsequent points. Do not attempt to draw more than one straight line through the data and do not construct a straight line where there is manifestly a Warwick Manufacturing Group 5
curve. In this example the fitting of the line presents no problem (plot 4). Note also that on the matter of how much data is required for a Weibull plot that any 4 or so of the pieces of data used here would give an adequate straight line. In such circumstances 4 points may well be enough. Generally, 7 or so points would be a reasonable minimum, depending on their shape once plotted. The fact that the data produced a straight line when initially plotted enables 2 statements to be made: The data can apparently be modelled by the Weibull distribution. The location parameter or minimum life ( ) is approximately zero. This parameter is discussed later. The next step is to construct a perpendicular from the Estimation Point in the top left hand corner of the paper to the plotted line (plot 5). Once the plotted line is obtained, information based on the sample can be extracted. For example, plot 6 illustrates that this data is indicating that a 400 hour life would result in about 15% of in-service failures for these equipments. Conversely, an acceptable level of in-service failure may be converted into a life; for example it can be seen from plot 6 that an acceptable level of in-service failure of say, 30% would correspond to a life of about 550 hours, and so on. At plot 7 a scale for the estimate of the Shape Parameter, is highlighted. This scale can be seen to range from 0.5 to 5, although values outside this range are possible. The estimated value of, termed, is given by the intersection of the constructed perpendicular and the scale. In this example, is about 2.4 (plot 8). At plot 9 a dotted horizontal line is highlighted corresponding to an F(t) value of 63.2%. Now the scale parameter or characteristic life estimate is the life which corresponds to a cumulative mortality of 63.2% of the population. Hence to determine it is necessary only to follow the Estimator line horizontally until it intersects the plotted line and then read off the corresponding time on the lower scale. Plot 10 shows that, based on this sample, these components have an of about 830 hours. By this time 63.2% of them will have failed. At plot 11 the evaluation of the proportion failed corresponding to the mean of the distribution of the times to failure (P µ ) is shown to be 52.5% using the point of intersection of the perpendicular and the P µ scale. This value is inserted in the F(t) scale and its intersection with the plotted line determines the estimated mean of the distribution of the times to failure ( ). In this case this is about 740 hours. One additional piece of information which can be easily extracted also is the median life; that is to say the life corresponding to 50% mortality. This is shown at plot 12 to be about 720 hours, based on this sample. Warwick Manufacturing Group 6
4 INTERPRETATION OF WEIBULL OUTPUT 4.1 Concept of Hazard Before examining the significance of the Weibull shape parameter it is necessary to know something of the concept of hazard and the 3 so-called failure regimes. The parameter of interest here is the hazard rate, h(t). This is the conditional probability that an equipment will fail in a given interval of unit time given that it has survived until that interval of time. It is, therefore, the instantaneous failure rate and can in general be thought of as a measure of the probability of failure, where this probability varies with the time the item has been in service. The 3 failure regimes are defined in terms of hazard rate and not, as is a common misconception, in terms of failure rate. The 3 regimes are often thought of in the form of the so-called bath-tub curve; this is a valid concept for the behaviour of a system over its whole life but is a misleading model for the vast majority of components and, more importantly, their individual failure modes (see Reference 5 and 7). An individual mode is unlikely to exhibit more than one of the 3 characteristics of decreasing, constant or increasing hazard. Shape Parameter Less Than Unity. A value of less than unity indicates that the item or failure mode may be characterised by the first regime of decreasing hazard. This is sometimes termed the early failure of infant mortality period and it is a common fallacy that such failures are unavoidable. The distribution of times to failure will follow a hyper-exponential distribution in which the instantaneous probability of failure is decreasing with time in service. This hyper-exponential distribution models a concentration of failure times at each end of the time scale; many items fail early or else go on to a substantial life, whilst relatively few fail between the extremes. The extent to which is below 1 is a measure of the severity of the early failures; 0.9 for example would be a relatively weak early failure effect, particularly if the sample size and therefore the confidence, was low. If there is a single or a predominant failure mode with a, then clearly component lifing is inappropriate since the replacement is more likely to fail than the replaced item. Just as importantly, a gives a powerful indication of the causes of these failures, which are classically attributed to two deficiencies. First such failures result from poor quality control in the manufacturing process or some other mechanism which permits the installation of low quality components. It is for this reason that burn-in programmes are the common counterstrategy to poor quality control for electronic components which would otherwise generate an unacceptably high initial in-service level of failure. The second primary cause of infant mortality is an inadequate standard of maintenance activity, and here the analysis is pointing to a lack of quality rather than quantity in the work undertaken. The circumstance classically associated with infant mortality problems is the introduction of a new equipment, possibly of new design, which is unfamiliar to its operators and its maintainers. Clearly in such situations, the high initial level of unreliability should decrease with the dissemination of experience and the replacement of weakling components with those of normal standard. The problem of infant mortality has been shown to be much more prevalent than might have been anticipated. In one particular study (Part 2 of Reference 6) it was found to be the dominant failure regime on a variety of mechanical components of traditional design. Warwick Manufacturing Group 7
Shape Parameter Equal to Unity. When the shape parameter has a value of approximately one, the Weibull analysis is indicating that constant hazard conditions apply. This is the special case where the degree of hazard is not changing with time in service and such terms as failure rate, MTBF and MTTF may be used meaningfully. This is the most frequently assumed distribution because to do so simplifies the mathematical manipulation significantly and opens up the possibility of using many other reliability techniques which are based on, but rarely state, the precondition that constant hazard conditions apply. To assume constant hazard, with its associated negative exponential distribution of times to failure, over some or all of an equipment s life must frequently produce misleading conclusions. The term random failures is often used to describe constant hazard and refers to the necessary conditions that failures be independent of each other and of time. Equipments which predominantly suffer constant hazard over their working lives should not be lifed since, by definition, the replacement has the same hazard or instantaneous probability of failure as the replaced item. Individual failure modes with = 1 tend to be the exception. Frequently, an equipment will appear to exhibit constant hazard because it has several failure modes of a variety of types, none of which is dominant. This summation effect is a particular characteristic of complex maintained systems comprising multiple series elements whether they be electronic, electrical, mechanical or some combination, particularly when their lives have been randomized by earlier failure replacements. The difficulty here is that the counter-strategy for the individual failure modes may well be significantly different to those suggested by constant hazard conditions for the system or equipment as a whole. There may well be, therefore, a need for a deeper analysis at mode level. Typical counter-strategies to known constant hazard conditions include de-rating, redundancy or modification. Shape Parameter Greater Than Unity. If the Weibull shape parameter is greater than one the analysis is indicating that increasing hazard conditions apply. The instantaneous probability of failure is therefore increasing with time; the higher the value, the greater is the rate of increase. This is often called the wear- -out phase, although again this term can be misleading. The time dependence of failures now permits sensible consideration of planned replacement providing the total cost of a failure replacement is greater than the total cost of a planned replacement. The interval for such replacements should be optimised and there is at least one general technique (Reference 8) which will do this directly from the Weibull parameters, providing the total costs are known. Various values of can be associated with certain distributions of times to failure and the commonest causes of such distributions. A value of about 2 arises from a times to failure distribution which is roughly log-normal - see Figure 1: Such distributions may be attributable to a wear-out phenomenon but are classically generated by situations where failure is due to the nucleation effect of imperfections or weaknesses, such as in crack propagation. A shape parameter of about 2 is an indication, therefore, of fatigue failure. As the value increases above 2, the shape of the pdf approaches the symmetrical normal distribution until at = 3.4 the pdf is fully normal (Figure 2). A value of this order indicates at least one dominant failure mode which is being caused by wear or attrition. As the value rises still further so does the rate of wear-out. Such situations need not necessarily be viewed with alarm; if the combined analysis for the 3 Weibull Warwick Manufacturing Group 8
pdf for = 2 time Figure 1 Probability Density Function for a Shape Parameter of 2 pdf for = 3.4 time Figure 2 Probability Density Function for a Shape Parameter of 3.4 pdf for = 6 or 7 t 0 time Figure 3 Probability Density Function for a Shape Parameter of 6 to 7 parameters indicates a pdf of the form shown below, of which a very high, say about 6 or 7, is just one element, then clearly a strategy to replace at t 0 might be highly satisfactory, particularly if it is a critical component, since the evidence suggests there will be no in-service failures once that life is introduced (Figure 3). Warwick Manufacturing Group 9
The initiation of increasing hazard conditions and their rate of increase may be a function of the maintenance policy adopted and the operating conditions imposed on the equipment. Some General Comments on The Weibull shape parameter provides a clear indication of which failure regime is the appropriate one for the mode under investigation and quantifies the degree of decreasing or increasing hazard. It can be used therefore, to indicate which counter-strategies are most likely to succeed and aids interpretation of the physics of failure. It can also be used to quantify the effects of any modifications or maintenance policy changes. Although the use of median ranks provides the best estimate of by un-biasing the sample data, it is important to remember that the confidence which can be placed on the estimate for any given failure mode is primarily a function of the sample size and quality of the data for that mode. 4.2 Scale Parameter or Characteristic LIfe As stated earlier, is the value in time by which 63.2% of all failures will have occurred. In this sense, is just one point on the time scale, providing some standard measure of the distribution of times to failure. Looking back at the example of 10 items, it was found that = 2.4 and = 830 hours. This information helps the construction of a picture of the appropriate pdf. To say here that the characteristic life is 830 hours is to say simply that roughly two thirds of all failures will occur by that time, according to this sample. As Sherwin showed in his study at Reference 6, this is a very useful means of quantifying the effects of some change in maintenance strategy. There are, however, others some of which were evaluated in the example. The mean of this log-normalish distribution for these items was found to be about 740 hours and corresponded to a percent failed of 52.5%. Figure 5 can be sketched using these estimates: Alternatively the median or 50% life was found to be about 720 hours: Here the 3 measures of time are all doing roughly the same thing. The characteristic life, however, is taken as the standard measure of position. Its significance is strengthened by the fact that when constant hazard conditions apply, ie = 1, then the value becomes the mean f(t) 63.2% =2.4 = 830 time Figure 4 Probability Density Function and Characteristic Life Warwick Manufacturing Group 10
f(t) =2.4 52.7% = 740 time Figure 5 Probability Density Function and Mean Life f(t) =2.4 50% life = 720 time Figure 6 Probability Density Function and Median Life time between failures (MTBF) for a repairable equipment or a mean time to failure (MTTF) for a non-repairable equipment, and is therefore the inverse of the constant hazard failure rate. This is the only circumstance in which may be termed an MTBF/MTTF. 4.3 Location Parameter or Minimum Life It was briefly stated during the example that if a reasonable straight line could be fitted to the initial plot, then the value of the location parameter is approximately zero. Sometimes, however, the first plot may appear concave when viewed from the bottom right hand corner of the sheet (Figure 7): When this occurs it is necessary to subtract some quantity of time ( ) from every failure time used to plot the curve. This is best done by a method attributed to General Motors and shown in Annex C. Using this or any other suitable method, an estimate of, termed, can be obtained. The estimate is enhanced by subtracting its value from every failure time and replotting the data: if is too small the curve will remain concave but to a lesser degree than before: if is too large the plot will become convex; and the best estimate of is that value which when subtracted from all the failure times gives the best straight line. Warwick Manufacturing Group 11
F(t) x x x x x x x time Figure 7 Representing Points on a Curve using Weibull Paper The significance of is that it is some value of time by which the complete distribution of times to failure is shifted, normally to the right, hence the term location. In the earlier example the distribution with = 0 is shown at Figure 4. If, however, had taken some positive value, say 425 hours, then this value must be added to all the times to failure extracted from the subsequent analysis of the straight line, and Figure 4 would have changed to that illustrated at Figure 8. Here two thirds of the population do not fail until 1245 hours and most importantly the value or minimum life value has shifted the time origin such that no failures are anticipated in the first 425 hours of service. The existence of a positive location parameter is therefore a highly desirable feature in any equipment and the initial plot should always be examined for a potential concave form. A further example of a 3-parameter Weibull plot is given at Annex D. f(t) 63.2% =2.4 = 425 830 new = 830 + 425 = 1255 time Figure 8 Effect of Location Parameter Warwick Manufacturing Group 12
5 PRACTICAL DIFFICULTIES WITH WEIBULL PLOTTING 5.1 Scatter The problem of scatter in the original data and the resultant snaking effect this can produce has been briefly mentioned. At Annex E, however, is a plot using 11 pieces of real data which illustrates a severe case of snaking. It is possible to plot a line and an attempt has been made in this case which gives the necessary added weight to later points. The difficulty is obvious; it is necessary to satisfy yourself that you are seeing true snaking about a straight line caused by scatter of the points about the line and not some other phenomenon. 5.2 Extrapolation Successful Weibull plotting relies on having historical failure data. Inaccuracies will arise if the span in time of that data is not significantly greater than the mean of the distribution of times to failure. If data obtained over an inadequate range is used as a basis for extrapolation (i.e.) extending the plotted line significantly, estimates of the 3 parameters are likely to be inaccurate and may well fail to reveal characteristics of later life such as a bi-modal wear-out phenomenon. The solution is comprehensive data at the right level. 5.3 Multi-Modal Failures The difficulty of multi-modal failures has been mentioned previously. In the same way that the distribution of times to failures for a single mode will be a characteristic of that mode, so the more modes there are contributing to the failure data, the more the individual characteristics of number of failure modes often tends to look like constant hazard ( = 1.0). In some cases this has been found to be so even when the modes themselves have all had a high wearout characteristic ( 3 or 4). This tendency is strongest when there are many modes none of which is dominant. Hence a knowledge of the failure regimes of the individual failure modes of an equipment is more useful in formulating a maintenance policy than that of the failure regime of the equipment itself. The solution once again is data precise enough to identify the F(t) or Figure 9 Representation of Multi-Modal Behaviour on Weibull Paper time Warwick Manufacturing Group 13
characteristics of all the significant failure modes. A Weibull plot using data gathered at equipment level may or may not indicate multi-modal behaviour. The most frequent manifestation of such behaviour is a convex or cranked plot as shown in Figure 9. The cranked plot shown above should not normally be drawn since it implies the existence of 2 failure regimes, one following the other in time. This is rarely the case; in general the bi- or multi-modal plots will be found to be mixed along both lines, because the distributions of times to failure themselves overlap. This is illustrated in Figure 10. f(t) mode 1 < 1, hence infant mortality mode 2 > 1, showing time dependent failures time Figure 10 Multiple Probability Density Functions One example of this bi-modal behaviour is quoted in Reference 6. There a vacuum pump was found to have one mode of severe infant mortality ( = 0.42) combined with another of wearout ( = 3.2). It is most unlikely that an analysis of their combined times to failure would have suggested an adequate maintenance strategy for the item as a whole. The convex curve also shown in Figure 9 indicates the presence of corrupt or multi-modal data. One form of corruption stems from the concept of a negative location parameter; if life is consumed in storage but the failure data under analysis is using an in-service life measured once the items are issued from store, then clearly the data is corrupt in that only a part life is being used in the analysis. Once adequate multi-modal data has been obtained it is possible to separate the data for each mode and replot all the data in such a way as to make maximum use of every piece of life information. This approach provides more confidence than simply plotting failure data for the individual mode and is best done using an adaptation of the technique for dealing with multiply-censored data; this topic is covered later. 5.4 Confidence Limits As was pointed out earlier, most forms of analysis will give a false impression of accuracy and Weibull is no exception, particularly when the same size is less than 50. The limitations of the data are best recognised by the construction of suitable confidence limits on the original plot. The confidence limits normally employed are the 95% lower confidence limit (LCL) or 5% Ranks, and the 5% upper confidence limit (UCL) or 95% Ranks, although other levels of confidence can be used. With these notes are tables of LCL and UCL ranks which can be seen to be a function solely of sample size. The technique for using these ranks consists of entering the Warwick Manufacturing Group 14
vertical axis of the Weibull plot at the i th F(t) value quoted in the tables for the appropriate sample size. A straight horizontal line should be drawn from the point of entry to intersect the line constructed from the data. From the point of intersection, move vertically up (for a lower limit) or down (for an upper limit) until horizontal with the corresponding i th plotted point. The technique is shown at Plot 1 of Annex F for the lower bound using the same example as in Annex B. The first value obtained from the table for a sample size of 10 is 0.5; this cannot be used since it does not intersect the plotted line. The next value is 3.6 and this is shown in Plot 1 to generate point (1) on the lower bound. The third point of entry is at 8.7 and this is shown to produce point (2) which is level with the third plotted point for the straight line, and so on. The primary use of this lower bound curve constructed through the final set of points is that it is a visual statement of how bad this equipment might be and still give rise to the raw data observed, with 95% confidence. Hence it can be said here that although the best estimate for is 830 hours, we can be only 95% confident, based on the data used, that the true is greater than or equal to 615 hours. Similarly at Plot 2, which shows the construction of a 95% upper bound, we can be 95% confident that the true is less than or equal to 1040 hours. These 2 statements can be combined to give symmetrical 90% confidence limits of between 615 and 1040 hours. This range can only be reduced by either diminishing the confidence level (and therefore increasing the risks of erroneous deduction) or by increasing the quantity of data. 5.5 Censoring of Sample Data Often samples contain information on incomplete times to failure in addition to the more obviously useful consumed lives at failure. This incomplete data may arise because an item has to be withdrawn for some reason other than the failure which is being studied. If the equipment suffers multi-modal failures then in an analysis of a particular mode, failure times attributable to all other modes become censorings. Alternatively the data collection period may end without some equipments failing, ie unknown finish times. The outcome of such situations is generally a series of complete failure times and a series of incomplete failure times or censorings for the mode under investigation. This latter information, this collection of times when the equipment did not fail for the particular reason cannot be ignored since to do so would bias the analysis, and diminish the confidence level associated with subsequent statements drawn from the plot. The assumption is generally made that the non-failures would have failed with equal probability at any time between the known failures or censored lives or after all of them. Therefore an item removed during inspection because it is nearing unacceptable limits is closer to a failure and is not a censoring. The mechanics of dealing with censored data require the determination of a mean order number for each failure; this may be considered as an alternative to the failure number i used previously, the primary difference being that the mean order number becomes a non-integer once the first censoring is reached. The technique is outlined using the example at Annex G. As a first step a table is constructed with columns a and b listing in ascending order the failure and censoring times respectively. Column (c) is calculated as the survivors prior to each event in either of columns a or b; where the event is a censoring the corresponding surviving number is shown in parenthesis by convention. Clearly the data in the sample is multiply-censored in that it is a mixture of failure and censored times; a total of 7 failures and 9 censorings gives a sample size n of 16. Warwick Manufacturing Group 15
Column (d) is obtained using the formula: m i m i 1 n 1 1 m k i i 1 Where m i = current mean order number m i-1 = previous mean order number n k i = total sample size for failure and censorings = number of survivors prior to the failure or censoring under consideration Mean order number values are determined only for failures. Once the first censoring occurs at 65, all subsequent m i values are non-integers. The median rank values at column (e) are taken from the median rank tables using linear interpolation when necessary. For purposes of comparison only, the equivalent median ranks obtained from Bernard s Approximation, (i-0.3)/(n + 0.4) are included at column (f). These are obtained by substituting m i for i in the standard expression. These can be seen to be largely in agreement with the purer figures in column (e). Finally 5% LCL AND 95% UCL figures are included at columns (g) and (h). These are obtained from the tables using linear interpolation where necessary. The median rank figures in column (e) are plotted on Weibull paper against the corresponding failure times at column a in the normal way. The plot is illustrated at Plot 1 of Annex G, and produces, and estimates without difficulty. For completeness, Plot 2 shows the 5% LCL AND 95% UCL curves; a 90% confidence range for of between 90 and 148 units of time is obtained. 6 COMPARISON WITH HAZARD PLOTTING It is often thought that Weibull plots are no better than plotting techniques based on the cumulative hazard function calculated from sample data. Such methods will give estimates of the 3 Weibull parameters and the mechanics of obtaining them are often slightly simpler than for the equivalent application of Weibull. However, cumulative hazard plots give little feel for the behaviour of the equipment in terms of the levels of risk of in-service failures for a proposed life. More importantly, such methods contain no correction for small sample bias and are therefore less suitable for use with samples smaller than 50. This limitation is is compounded by the difficulty of attempting the evaluation of confidence limits on a cumulative hazard plot. Finally, cases have occurred where cumulative hazard plots have failed to indicate multimodal behaviour which was readily apparent from a conventional Weibull plot from the same data. Warwick Manufacturing Group 16
7 CONCLUSIONS The ability of the Weibull distribution to model failure situations of many types, including those where non-constant hazard conditions apply, make it one of the most generally useful distributions for analyzing failure data. The information it provides, both in terms of the modelled distribution of times to failure and the prevailing failure regime is fundamental to the selection of a successful maintenance strategy, whether or not component lifing is an element in that strategy. Weibull s use of median ranks helps overcome the problems inherent in small samples. The degree of risk associated with small samples can be quantified using confidence limits and this can be done for complete or multiply-censored data. Weibull plots can quantify the risks associated with a proposed lifing policy and can indicate the likely distribution of failure arisings. In addition, they may well indicate the presence of more than one failure mode. However, Weibull is not an autonomous process for providing instant solutions; it must be used in conjunction with a knowledge of the mechanics of the failures under study. The final point to be made is that Weibull, like all such techniques, relies upon data of adequate quantity and quality; this is particularly true of multi-modal failure patterns. REFERENCES 1. Weibull W. A statistical distribution function of wide application. ASME paper 51-A-6, Nov 1951. 2. Mann R N, Schafer R E and Singpurwalla N D. Methods for statistical analysis of reliability and life data. Wiley 1974. 3. Bompas-Smith J H. Mechanical survival - the use of reliability data. McGraw-Hill 1973. 4. Carter ADS. Mechanical reliability. Macmillan 1972. 5. British Standard 5760: Part 2: 1981. Reliability of systems, equipments and components; guide to the assessment of reliability. 6. Sherwin D J and Lees F P. An investigation of the application of failure data analysis to decision making in the maintenance of process plant. Proc Instn Mech Engrs, Vol 194, No 29, 1980. 7. Carter ADS. The bathtub curve for mechanical components - fact or fiction. Conference on Improvement of Reliability in Engineering, Instn Mech Engrs, Loughborough 1973. 8. Glasser G J. Planned replacement: some theory and its application. Journal of Quality Technology, Vol 1,No 2. April 1969. Warwick Manufacturing Group 17
ANNEX A TWO CYCLE WEIBULL PAPER Warwick Manufacturing Group 18
Warwick Manufacturing Group 19
ANNEX B PROGRESSIVE EXAMPLE OF WEIBULL PLOTTING Arranging the Raw Data Ranked Hours Median Rank Failure Number at Failure Cumulative % Failed (i) (t i) F(t) 1 300 6.7 2 410 16.2 3 500 25.9 4 600 35.5 5 660 45.2 6 750 54.8 7 825 64.5 8 900 74.1 9 1050 83.8 10 1200 93.3 The following plots illustrate Weibull plotting. Warwick Manufacturing Group 20
Warwick Manufacturing Group 21
Warwick Manufacturing Group 22
Warwick Manufacturing Group 23
Warwick Manufacturing Group 24
Warwick Manufacturing Group 25
Warwick Manufacturing Group 26
Warwick Manufacturing Group 27
Warwick Manufacturing Group 28
Warwick Manufacturing Group 29
Warwick Manufacturing Group 30
ANNEX C Steps: ESTIMATION OF WEIBULL LOCATION PARAMETER 1. Plot the data initially, observing a concave curve when viewed from the bottom right hand corner. 2. Select 2 extreme points on the vertical scale (say a and b), and determine the corresponding failure times (t 1 and t 3 ). 3. Divide the physical distance between points a and b in half without regard for the scale of the vertical axis, and so obtain point c. 4. Determine the failure time corresponding to point c (ie t 2 ). 5. he estimate of the location parameter is given by: t 2 t t t t 3 2 2 1 t t t t 3 2 2 1 Weibull Plot b = t - (t - t )(t - t ) 2 3 2 2 1 (t -t)-(t -t) 3 2 2 1 a t 1 t 2 t 3 time Figure 11 Estimation of Location Parameter Warwick Manufacturing Group 31
ANNEX D EXAMPLE OF A 3-PARAMETER WEIBULL PLOT Problem: to determine the Weibull parameters for the following (ordered) sample times to failure: 1000, 1300, 1550, 1850, 2100, 2450 and 3000 hours. Steps: 1. Plot initially (Plot 1). 2. Having identified a concave form apply the technique at Annex C (Plot 2). 3. Determine and evaluate modified times to failure. 4. Plot modified points and confirm a straight line (Plot 3). 5. Extract and in the normal way remembering to add to the straight line value for (Plot 4). 6. Sketch the probability density function (Plot 5). Plotting the raw data: Ranked Hours Median Rank Failure Number at Failure Cumulative % Failed (i) (t i) F(t) 1 1000 9.4 2 1300 22.8 3 1550 36.4 4 1850 50.0 5 2100 63.6 6 2450 77.2 7 3000 90.6 Warwick Manufacturing Group 32
Warwick Manufacturing Group 33
Warwick Manufacturing Group 34
From Plot 2: t 1 t 2 t 3 = 810 hours = 1500 hours = 4000 hours General expression from Annex D: t 2 t t t t 3 2 2 1 t t t t 3 2 2 1 1500 4000 1500 1500 810 4000 1500 1500 810 1500 953 547 hours Replot using: 1000-547 = 453 1300-547 = 753 1550-547 = 1003 1850-547 = 1303 2100-547 = 1553 2450-547 = 1903 3000-547 = 2453 f(t) b=1.9 63.2% = 547 1560 time = 547 + 1560 = 2107 Figure 12 Probability Density Function Warwick Manufacturing Group 35
Warwick Manufacturing Group 36
ANNEX E THE EFFECT OF SCATTER Warwick Manufacturing Group 37
Warwick Manufacturing Group 38
ANNEX F 95% CONFIDENCE LIMITS FOR WEIBULL Warwick Manufacturing Group 39
Warwick Manufacturing Group 40
Warwick Manufacturing Group 41
ANNEX G WEIBULL PLOT OF MULTIPLY-CENSORED DATA Warwick Manufacturing Group 42
Failure Times ti (a) Censoring Times ci (b) Survivors ki (c) Mean Order Number Mi (d) Median Ranks % (e) Bernards Approx % (f) 5% Rank Lower Bound (g) 95% Rank Upper Bound (h) 31.7 16 1 4.2 4.27 0.3 17 39.2 15 2 10.2 10.37 2.2 26 57.5 14 3 16.3 16.46 5.3 34 65.0 13) 65.8 12 3 16 1 3 1 12 408. 22.89 23.05 9.32 42.48 70.0 11 408. 16 1 4. 08 1 11 516. 29.49 29.63 13.8 49.12 75.0 75.0 87.5 88.3 84.2 101.7 (10) (9) (8) (7) (6) (5) 105.8 4 7.53 44.03 44.09 25.65 64.18 109.2 (3) 110.0 2 10.69 63.31 63.35 43.14 80.45 130.0 (1) Multiply Censored Data Warwick Manufacturing Group 43
Warwick Manufacturing Group 44
Warwick Manufacturing Group 45