Availability, Reliability, Maintainability, and Capability

Availability, Reliability, Maintainability, and Capability H. Paul Barringer, P.E. Barringer & Assoiates, In. Humble, TX Triplex Chapter Of The Vibrations Institute Hilton Hotel Beaumont, Texas February 18, 1997

Availability, Reliability, Maintainability, and Capability H. Paul Barringer, P.E. Barringer & Assoiates, In. P.O. Box 3985, Humble, TX 77347-3985 Phone: 281-852-6810 FAX: 281-852-3749 hpaul@barringer1.om http://www.barringer1.om Availability, reliability, maintainability, and apability are omponents of the effetiveness equation. The effetiveness equation is a figure of merit whih is helpful for deiding whih omponent(s) detrat from performane measures. In many ontinuous proess plants the reliability omponent is the largest detrator from better performane. Calulation of the omponents are illustrated by use of a small data set. Effetiveness is defined by an equation as a figure-of-merit judging the opportunity for produing the intended results. The effetiveness equation is desribed in different formats (Blanhard 1995, Keeioglu 1995, Landers 1996, Peht 1995, Raheja 1991). Eah effetiveness element varies as a probability. Sine omponents of the effetiveness equation have different forms, it varies from one writer to the next. Definitions of the effetiveness equation, and it s omponents, generate many tehnial arguments. The major (and unarguable eonomi issue) is finding a system effetiveness value whih gives lowest long term ost of ownership using life yle osts, (LCC) (Barringer 1996a and 1997) for the value reeived: System effetiveness = Effetiveness/LCC Cost is a measure of resoure usage. Lower ost is generally better than higher osts. Cost estimates never inlude all possible elements but hopefully inludes the most important elements. Effetiveness is a measure of value reeived. Clements (1991) desribes effetiveness as telling how well the produt/proess satisfies end user demands. Higher effetiveness is generally better than lower effetiveness. Effetiveness varies from 0 to 1 and rarely inludes all value elements as many are too diffiult to quantify. One form is desribed by Berger (1993): Effetiveness = availability * reliability * maintainability * apability In plain English, the effetiveness equation is the produt of: --the hane the equipment or system will be available to perform its duty, --it will operate for a given time without failure, --it is repaired without exessive lost maintenane time and --it an perform its intended prodution ativity aording to the standard. Eah element of the effetiveness equation requires a firm datum whih hanges with name plate ratings for a true value that lies between 0 and 1. Berger s effetiveness equation (availability * reliability * maintainability * apability) is argued by some as flawed beause it ontains availability and omponents of availability (reliability and maintainability). Blanhard s effetiveness equation (availability*dependability*performane) has -2-

similar flaws. For any index to be suessful, it must be understandable and reditable by the people who will use it. Most people understand availability and an quantify it. Few an quantify reliability or maintainability in terms everyone an understand. The effetiveness equation is simply a relative index for measuring how we are doing. Consider these elements of the effetiveness equation for refineries and hemial plants. In many ontinuous proess industries, availability is high (~85 to 98%), reliability is low (~0.001 to 10%) when measured against turnaround intervals, and maintainability is high (~50 to 90%) when measured against the allowed time for repairs, and produtivity is high (~60 to 90%). So what does the effetiveness equation tell about these onditions? The one element destroying effetiveness is the reliability omponent (Barringer 1996b) so it tells where to look for making improvements. Can the effetiveness equation be used to benhmark one business to another? In theory yes, but in pratie no. The pratial problem lies in normalizing effetiveness data aross ompanies and aross business lines. For example, one plant may have an aeptable mission time for their equipment of one year, whereas a seond plant may require a five year mission time beause of their turnarounds. Similarly, one plant may set a repair time for a speifi pump as 8 hours elapsed time for a two man rew and the seond plant may allow 12 hours elapsed time for a two man rew. At best, the effetiveness equation is appliable within a ompany where similar rules are applied aross operating plants and the ost struture is similar. Table 1: Raw Data From Operating Logs The importane of quantifying elements of the effetiveness equation (and their assoiated osts) is to find areas for improvement. For example, if availability is 98%, reliability is 70%, maintainability is 70%, and apability is 65%, the opportunity for improving apability is usually muh greater than for improving availability. Table 1 ontains a simple data set used to illustrate how some abilities are alulated. Events are put into ategories of up time and down time for a system. Beause the data laks speifi failure details, the up time intervals are often onsidered as generi age-to-failure data. Likewise, the speifi maintenane details are often onsider as generi repair times. Add more details to the reports to inrease their usefulness. This limited data an be helpful for understanding the effetiveness equation even though most plant level people do not aknowledge the have adequate data for analysis (Barringer 1995). Wall Clok Hours Start End Elapsed Time For Up Time Elapsed Time For Down Time 0 708.2 708.2 708.2 711.7 3.5 711.7 754.1 42.4 754.1 754.7 0.6 754.7 1867.5 1112.8 1867.5 1887.4 19.9 1887.4 2336.8 449.4 2336.8 2348.9 12.1 2348.9 4447.2 2098.3 4447.2 4452 4.8 4452 4559.6 107.6 4559.6 4561.1 1.5 4561.1 5443.9 882.8 5443.9 5450.1 6.2 5450.1 5629.4 179.3 5629.4 5658.1 28.7 5658.1 7108.7 1450.6 7108.7 7116.5 7.8 7116.5 7375.2 258.7 7375.2 7384.9 9.7 7384.9 7952.3 567.4 7952.3 7967.5 15.2 7967.5 8315.3 347.8 8315.3 8317.8 2.5 Total = 8205.3 112.5 MTBM= 683.8 MTTR= 9.4-3-

Availability deals with the duration of up-time for operations and is a measure of how often the system is alive and well. It is often expressed as (up-time)/(up-time + downtime) with many different variants. Up-time and downtime refer to dihotomized onditions. Up-time refers to a apability to perform the task and downtime refers to not being able to perform the task, i.e., uptime not downtime. Also availability may be the produt of many different terms suh as: A = A hardware * A software * A humans * A interfaes * A proess and similar onfigurations. Availability issues deal with at least three main fators (Davidson 1988) for: 1) inreasing time to failure, 2) dereasing downtime due to repairs or sheduled maintenane, and 3) aomplishing items 1 and 2 in a ost effetive manner. As availability grows, the apaity for making money inreases beause the equipment is inservie a larger perent of time. Three frequently used availability terms (Ireson 1996) are explained below. Inherent availability, as seen by maintenane personnel, (exludes preventive maintenane outages, supply delays, and administrative delays) is defined as: A i = MTBF/(MTBF + MTTR) Ahieved availability, as seen by the maintenane department, (inludes both orretive and preventive maintenane but does not inlude supply delays and administrative delays) is defined as: A a = MTBM/(MTBM + MAMT) Where MTBM is mean time between orretive and preventive maintenane ations and MAMT is the mean ative maintenane time. Operational availability, as seen by the user, is defined as: A o = MTBM/(MTBM + MDT) Where MDT is mean down time. A few key words desribing availability in quantitative words are: on-line time, stream fator time, lak of downtime, and a host of loal operating terms inluding a minimum value for operational availability. Even though equipment many not be in atual operation, the prodution departments wants it available at least a speified amount of time to omplete their tasks and thus the need for a minimum availability value. An example of 98% availability for a ontinuous proess says to expet up-time of 0.98*8760 = 8584.8 hr/yr and downtime of 0.02*8760 = 175.2 hrs/yr as availability + unavailability = 1. Now, using the data set provided above in Table 1, the dihotomized availability is 98.6% based on up time = 8205.3 hours and downtime = 112.5 hours. Of ourse the dihotomized view of availability is simplisti and provides worst ase availability numbers. Not all equipment in a train provides binary results of only up or only down sometimes -4-

it s partially up or partially down. Clearly the issue is orretly defining failure. In the pratial world, omplexities exist in the definitions for when only some of the equipment is available in a train, and the net availability is less than the ideal availability i.e., a utbak in output ours beause of equipment failure whih dereases the idealized output from say 95% to a lower value suh as say 87% when failures are orretly defined. A key measure is defining the utbak (and thus loss of availability from a dihotomized viewpoint) when the utbak delines to a level ausing finanial losses this is the eonomi standard for failure. In short, the area under the availability urve an be summed to alulate a pratial level of availability and generate higher values for availability than when only dihotomized values are used. Lak of availability is a problem related to primarily to failures of equipment. But the root ause of the failure may lie in different areas than initially expeted. Often deterioration, leading to eonomi failure, auses onflits in the definitions of reliability, maintainability, and apability real life issues are rarely simple and independent. For prodution purposes, a system must be fully available (ready for servie) and reliability (absene of failures) to produe effetive results. Reliability deals with reduing the frequeny of failures over a time interval and is a measure of the probability for failure-free operation during a given interval, i.e., it is a measure of suess for a failure free operation. It is often expressed as R(t) = exp(-t/mtbf) = exp(-λt) where λ is onstant failure rate and MTBF is mean time between failure. MTBF measures the time between system failures and is easier to understand than a probability number. For exponentially distributed failure modes, MTBF is a basi figure-of-merit for reliability (failure rate, λ, is the reiproal of MTBF). For a given mission time, to ahieve high reliability, a long MTBF is required. Also reliability may be the produt of many different reliability terms suh as and similar onfigurations. R = R utilities * R feed-plant * R proessing * R pakaging * R shipping To the user of a produt, reliability is measured by a long, failure free, operation. Long periods of failure free interruptions results in inreased produtive apability while requiring fewer spare parts and less manpower for maintenane ativities whih results in lower osts. To the supplier of a produt, reliability is measured by ompleting a failure free warranty period under speified operating onditions with few failures during the design life of the produt. Improving reliability ours at an inreased apital ost but brings with it the expetation for improving availability, dereasing downtime and smaller maintenane osts, improved seondary failure osts, and results in better hanes for making money beause the equipment is free from failures for longer periods of time. While general alulations of reliability pertain to onstant failure rates, detailed alulations of reliability are based on onsideration of the failure mode whih may be infant mortality (dereasing failure rates with time), hane failure (onstant failure rates with time), or wear-out (inreasing failure rates with time). -5-

A few key words desribing reliability in quantitative words are: mean time to failure, mean time between failures, mean time between/before maintenane ations, mean time between/before repairs, mean life of units in ounting units suh as hours or yles, failure rates, and the maximum number of failures in a speified time interval. An example of a mission time of one year with equipment whih has a 30 year mean time to failure gives a reliability of 96.72% whih is the probability of suessfully ompeting the one year time interval without failure. The probability for failure is 3.278% as reliability + unreliability = 1. For reliability issues, defining the mission time is very important to get valid answers. Notie from the example that high reliability for mission times of one year or more require high inherent reliability (i.e., large mean times to failure) often the inherent reliability is not ahieved due to operating errors and maintenane errors. The data in Table 1 shows the mean time between maintenane ations is 683.8 hours. Calulate the system reliability using the exponential distributions desribed above and a mission time of one year. The system has a reliability of exp(-8760/683.8) = 0.00027%. The reliability value is the probability of ompleting the one year mission without failure. In short, the system is highly unreliable (for a one year mission time) and maintenane ations are in high demand as the system is expeted to have 8760/683.8=12.8 maintenane ations per year! The above alulations for reliability were driven by arithmeti alulations. More aurate projetions are found by building a probability hart from the data in Table 1 using WinSMITH Weibull software (Fulton 1996). Figure 1 shows the mean time between maintenane events is 730 hours. Ninety-eight perent of all the up time will lie between 7.3 hours and 3362.3 hours. O u r r e n e 95 80 60 40 20 99 90 70 50 30 W/rr Elapsed Time Between Maintenane Events 1 C D F 5 10 % 2 1 Eta Beta r^2 n/s 729.9988 1 1 12/0 1 10 100 1000 10000 Time (hours) -6- Figure 1: Probability Plot Of Up Time

Ten perent of all up times will be less than 76.9 hours. The median up time is 506 hours. So how an high availability be ahieved with systems requiring many maintenane ations? The maintenane ations must be performed very quikly to minimize outages!!!!! This leads to pressures for establishing world lass maintenane operations. A better way to solve the problem is to redue the number of failures thus demands for world lass maintenane operations is avoided and osts are dereased partiularly when life yle osts drive the ations. Remember failures arry hidden osts resulting from the hidden fatories assoiated with prodution losses for disposal of srap and the slow output inurred while reestablishing steady state onditions the lost time may be 1.5 to 5 times the obvious lost time osts. The real issue for studying reliability is driven by a simple onept alled money partiularly when the ost of unreliability (Barringer 1996) is identified and used for motivating trade-off studies. High reliability (few failures) and high maintainability (preditable maintenane times) tend toward highly effetive systems. Maintainability deals with duration of maintenane outages or how long it takes to ahieve (ease and speed) the maintenane ations ompared to a datum. The datum inludes maintenane (all ations neessary for retaining an item in, or restoring an item to, a speified, good ondition) is performed by personnel having speified skill levels, using presribed proedures and resoures, at eah presribed level of maintenane. Maintainability harateristis are usually determined by equipment design whih set maintenane proedures and determine the length of repair times. The key figure of merit for maintainability is often the mean time to repair (MTTR) and a limit for the maximum repair time. Qualitatively it refers to the ease with whih hardware or software is restored to a funtioning state. Quantitatively it has probabilities and is measured based on the total down time for maintenane inluding all time for: diagnosis, trouble shooting, tear-down, removal/replaement, ative repair time, verifiation testing that the repair is adequate, delays for logisti movements, and administrative maintenane delays. It is often expressed as M(t) = 1- exp(-t/mttr) = 1 - exp(-µt) where µ is onstant maintenane rate and MTTR is mean time to repair. MTTR is an arithmeti average of how fast the system is repaired and is easier to visualize than the probability value. Note the simple, easy to use riteria shown above, is frequently expressed in exponential repair times. A better and more aurate formula requires use of a different equation for the very umbersome log-normal distributions of repair times desribing maintenane times whih are skewed to the right. The maintainability issue is to ahieve short repair times for keeping availability high so that downtime of produtive equipment is minimized for ost ontrol when availability is ritial. An example of a stated maintainability goal is a 90% probability that maintenane repair times will be ompleted in 8 hours or less with a maximum repair time of 24 hours. This requires a system MTTR of 3.48 hours. Also the ap of 24 hours (99.9% of repairs will be aomplished in this -7-

time, or less) requires ontrol of three main items of downtime: 1) ative repair time (a funtion of design, training, and skill of maintenane personnel), 2) logisti time (time lost for supplying the replaement parts), and 3) administrative time (a funtion of the operational struture of the organization). The probability for not meeting the speified 8 hour repair interval in this example is 10% based on a MTTR of 3.48 hours as maintainability + unmaintainability = 1. Data in Table 1 shows mean down time due to maintenane ations is 9.4 hours. Calulate the system maintainability using the exponential distributions and an allowed repair time of 10 hours. The system has a maintainability of 1-exp(-10/9.4) = 65.5%. The maintainability value is the probability of ompleting the repairs in the allowed interval of 10 hours. In short, the system has a modest maintainability value (for the allowed repair interval of 10 hours)! The above alulations for maintainability were driven by arithmeti alulations. More aurate projetions are found by building a probability hart from the data in Table 1. Figure 2 shows the mean down time for repairs is 10.0 hours. Ninety-eight perent of all the down time for maintenane will lie between 0.1 hours and 45.6 hours. Ten perent of all down times will be less than 1.1 hours. The median down time is 7.0 hours. O u r r e n e C D F 95 80 60 40 20 5 99 90 70 50 30 10 W/rr Elapsed Time For Maintenane Ations 1 % 2 1.1 1 10 100 Time (hours) Eta Beta r^2 n/s 10.01688 1.008 0.999 12/0 Figure 2: Probability Plot of Maintenane Down Time A speial note of aution about the exponential distribution in Figure 2. Most repair times are not exponentially distributed they are usually log-normally distributed. However, do not worry about the distribution too muh as WinSMITH Weibull software allows many different distributions to be -8-

studied with ease using the New Weibull Handbook (Abernethy 1996). For engineers, the software is very helpful beause their time an be used to fix things rather than worrying about arane statistial matters. From the data in Figure 1, MTBM = 730.0 hours, and the data in Figure 2, MDT = 10.0, the operational availability an be alulated. A o = 730/(730+10) = 98.6% whih is the same value found from the alulation for (up time)/(up time + down time). High availability (high up-time), high reliability (few failures) and high maintainability (preditable and short maintenane times) tend toward highly effetive systems if apability is also maintained a high levels. Capability deals with produtive output ompared to inherent produtive output whih is a measure of how well the prodution ativity is performed ompared to the datum. This index measure the systems apability to perform the intended funtion on a system basis. Often the term is the synonymous with produtivity whih is the produt of effiieny multiplied by utilization. Effiieny measures the produtive work output versus the work input. Utilization is the ratio of time spent on produtive efforts to the total time onsumed. For example, suppose effiieny is 80% beause of wasted labor/srap generated, and utilization is 82.19% beause the operation is operated 300 days per year out of 365 days. The apability is 0.8*0.8219 = 65.75%. These numbers are frequently generated by aounting departments for prodution departments as a key index of how they are doing. Thus these alulations need few explanations. System effetiveness equations (Effetiveness/LCC) are helpful for understanding benhmarks, past, present, and future status as shown in Figure 3 for understanding trade-off information. LCC New Last B * Worst? Trade-off Area Best * C Parameter Availability Reliability Maintainability A B C Last 0.95 0.3 0.7 New 0.95 0.4 0.7 Best 0.98 0.6 0.7 Last *? A Effetiveness Best Capability Effetiveness LCC 0.7 0.14 80 0.8 0.22 100 0.6 0.25 95 Figure 3: Benhmark Data Shown In Trade-Off Format -9-

The lower right hand orner of Figure 3 brings muh joy and happiness often desribed as bang for the buk (Weisz 1996). The upper left hand orner brings muh grief. The remaining two orners raise questions about worth and value. In summary, the elements of the effetiveness equation provide enlightenment about how things work in a ontinuous proessing plant. Clues are provided in the effetiveness equation for where orretive ation may be partiularly helpful. It is important to understand both reliability and maintainability along with traditional information about availability and apability. In all ases, alternatives should be onsidered, based on life yle osts, for ranking the high ost of problems so the important issues an be identified for orretive ation. Referenes Abernethy, Dr. Robert B. (1996), The New Weibull Handbook, 2 nd edition, published by the author, phone: 561-842-4082. Barringer, H. Paul and David P. Weber (1995), Where Is My Data For Making Reliability Improvements?, Fourth International Conferene on Proess Reliability sponsored by Hydroarbons Proessing and Gulf Publishing Company, Houston, TX. Barringer, H. Paul and David P. Weber (1996a), Life Cyle Cost Tutorial, Fifth International Conferene on Proess Reliability sponsored by Hydroarbons Proessing and Gulf Publishing Company, Houston, TX. Barringer, H. Paul (1996b), Pratial Reliability Tools For Refineries and Chemial s, National Petroleum Refiners Assoiation Maintenane Conferene and Exhibition, Nashville, TN. Barringer, H. Paul (1996), An Overview Of Reliability Engineering Priniples, Energy Week 1996, Sponsored by ASME & API and Organized by PennWell Conferenes, Houston, TX. Barringer, H. Paul (1997), Life Cyle Costs & Reliability For Proess Equipment, Energy Week 1997, Sponsored by ASME & API and Organized by PennWell Conferenes, Houston, TX. Berger, Gene and Herzl Marouni (1993), ASQC Setion 1405 Certified Reliability Refresher Course, Published by Gene Berger, Houston, TX. Blanhard, B. S., Dinesh Verma, Elmer L. Peterson 1995, Maintainability: A Key to Effetive Servieability and Maintenane Management, Prentie-Hall, Englewood Cliffs, NJ. Clements, Rihard Barrett (1991), Handbook of Statistial Methods in Manufaturing, Preentie Hall, Englewood Cliffs, New Jersey. Davidson, John 1988, The Reliability of Mehanial Systems, Mehanial Engineering Publiations Limited for The Institution of Mehanial Engineers, London. -10-

Fulton, Wes (1996), WinSMITH Weibull probability software, version 1.1P, Fulton Findings, 1251 W. Sepulveda Blvd., #800, Torrane, CA 90502, phone 310-548-6358. Ireson, W. Grant, Clyde F. Coombs, Jr., Rihard Y. Moss 1996, Handbook of Reliability Engineering and Management, 2 nd edition, MGraw-Hill. Keeioglu, Dimitri 1995, Maintainability, Availability, & Operational Readiness Engineering, Prentie Hall PTR, Upper Saddle River, NJ. Landers, Rihard R. 1996, Produt Assurane Ditionary, Marlton Publishers, 169 Vista Drive, Marlton, NJ 08053. Peht, Mihael 1995, Produt Reliability, Maintainability, and Supportability Handbook, CRC Press, New York. Raheja, Dev G. 191, Assurane Tehnologies, MGraw-Hill, In., NY. Weisz, John 1996, An Integrated Approah to Optimizing System Cost Effetiveness, 1996 Tutorial Notes Annual Reliability and Maintainability Symposium, available from Evans Assoiates, 804 Vikers Avenue, Durham, NC 27701. Biographi information H. Paul Barringer Manufaturing, engineering, and reliability onsultant and author of the basi reliability training ourse Reliability Engineering Priniples. More than thirty-five years of engineering and manufaturing experiene in design, prodution, quality, maintenane, and reliability of tehnial produts. Contributor to The New Weibull Handbook, a reliability engineering text published by Dr. Robert B. Abernethy. Named as inventor in six U.S.A. Patents. Registered Professional Engineer in Texas. Eduation inludes a MS and BS in Mehanial Engineering from North Carolina State University, and partiipated in Harvard University's three week Manufaturing Strategy onferene. Visit the world wide web site at http://www.barringer1.om for other bakground details or send e-mail to hpaul@barringer1.om onerning LCC or reliability issues. HPB 1/2/97-11-