BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to Cause by Jack V. Michaels, PE, CVS Value Engineering Department Martin Marietta Orlando Aerospace ABSTRACT Thomas Bayes (1702-1761) was an English mathematician who developed the first precise, quantitative mathematical expression for inductive inference. His theorem on inverse probability, published posthumously in 1763, yields the probability that an event that has already occurred may have occurred in a particular way, and affords a powerful tool for determining root cause. Although Bayes' Theorem evolved from postulates of classical probability, it has been the subject of controversy because of subjectivity in viewing prior events. There has never been a question, however, about the mathematical validity of the theorem; where empirical data or expert opinion are available, the theorem provides realworld conclusions. The essence of Bayes' Theorem is that the probability of occurrence of one particular hypothesis is equal to its conditional probability divided by the sum of the conditional probabilities of all possible hypotheses. This paper starts with a review of probability fundamentals to ensure a firm footing for understanding Bayes' Theorem. The use of the theorem is illustrated with increasingly complex applications to ensure the reader's grasp of its versatility and potential, and to provide models for applications unique to the reader. PROBABILITY FUNDAMENTALS This paper presents a concise overview of probability fundamentals leading to the definition of Bayes' Theorem. A number of applications are described to show the insight the theorem provides for dedicated users. INTRODUCTION The applications of Bayes' Theorem of Inverse Probability range from medicine, science, engineering, and manufacturing through economics, politics, and military strategy. Judicious use of Bayes' Theorem greatly improves the quality of decisions, especially where root causes of discerned effects are to be deduced, or where associated risks of alternative approaches are to be evaluated. Bayes' Theorem employs both a priori probability and a posteriori probability to evaluate causal hypotheses. Its conclusions are both precise mathematically and tempered by empirical data. It is only controversial because of the subjective nature of assigning a priori probabilities to events yet to take place. The probability of an event is a number from 0 to 1. This indicates the relative frequency with which the event would occur if the trial were repeated a large number of times. From the mathematical viewpoint, assignment of probabilities of occurrence to events is axiomatic; such assignments are arbitrarily based on certain immutable postulates. For example, the sum of the probabilities of occurrence of mutually exclusive and collectively exhaustive events must total 1. Such postulates are the bases for the definitions given in Table I, as well as for the rules for calculating probability values that follow. Addition Rule The probability of occurrence of two or more events is given by: P(E+F) = P(E)+P(F)-P(EF) (1) where P(E+F) denotes the probability of occurrence of E or F or both, and P(EF) the joint probability of occurrence of E and F. For example, if P(E) = 0.6 and P(F) = 0.2, then: P(E+F) = 0.6 + 0.2-0.12 = 0.68 (2) where E and F are mutually exclusive events, then: Copyright 1987 MARTIN MARIETTA CORPORATION Reprinted with permission of Martin Marietta Corporation., P(EF) =0 (3) 44
TABLE I. Some Probability Definitions A Posteriori Probability - Probability determined after trial or experimental information is obcained. Also called empirical probability. A Priori Probability - Probability determined be fore any t r i a l or experimental information is obtained. Also called mathematical probability. Conditional Probability - Probability of occurrence of an event Chat depends on the occurrence of another event in the sample space. Collectively ExhausCive EvenCs - Totality of all possible outcomes of a trial. Compound Events - Events composed of a group of mutually exclusive events, and whose probability of occurrence is the sum of the probabilities of occurrence of Che mutually exclusive events. Dependent Events - Possible outcomes of a Crial whose individual probabilities of occurrence depend on outcomes bf ocher events. Equally Probable Events - Possible outcomes of a trial whose individual probabilities of occurrence are equal. Independent Events - Possible outcomes of a trial whose individual probabilities of occurrence are independent of the occurrence of other events. Joint Probability - Probability of occurrence of two or more events. Marginal Probability - A priori probability of occurrence of dependenc events. Mutually Exclusive Events - Events whose occurrences preclude the occurrence of other events within the same sample space (e.g., heads or tails). Sample Space - Set of collectively exhaustive events. Trial - Endeavor having more than one possible outcome, where P(EF) denotes the joint probability of occurrence of E and F. Again, if P(E) = 0.6 and P(F) = 0.2, then: P(EF) = (0.6)(0.2) = 0.12. Product Rule-Dependent Events The probability of occurrence of two dependent events is given by: P(EF) = P(E)P(F E) P(FE) = P(F)P(E F) where P(F E) denotes the conditional probability of F given the occurrence of E and P(E F) the conditional probability of E given the occurrence of F. Consider the random selection of three parts frora a lot of five containing two defective parts, for example. What is the joint probability of all of the three selected parts being good? The joint probability of the first three parts selected being good is the marginal probability of the first part selected being good multiplied by the successive conditional probabilities: (7) (8) (9) P(GiG2G3) = P(Gi)P(G2 Gi)P(G3 GiG2). (10) The marginal probability of the first part selected being good is simply the number of good parts divided by the total number of parts, or: P(Gi) (11) The conditional probability of the second part selected being good given the first part selected is good is simply the ratio of the remaining number of good parts and nuraber of parts remaining, or: P(G2lGi) =. (12) and Equation 1 becomes simply: P(E+F) = P(E)+P(F) (4) Similarly, the conditional probability of the third part being good given the first two parts selected are good is: P(G3 GiG2) = i. (13) where P(E+F) now denotes the probability of occurrence of E or F, but not both. In this case the example yields: P(E+F) = 0.6 + 0.2 = 0.8. (5) Product Rule-Independent Events The joint probability of occurrence of two or more independent events is given by: P(EF) = P(E)P(F) (6) Substituting the values from Equations 11, 12, and 13 in Equation 10 yields: Elimination Rule 10. (14) The elimination rule provides a generalization of the product rule for any number of dependent 45
events. It specifies the probability of occurrence of an event that can conditionally occur in several ways and is so called because it eliminates noncontributing factors. It sets the stage for defining Bayes' Theorem. If El, E2, E_ are events in a sample space, then the probability of occurrence of an event F in the sample space can be expressed by: H2 P(E H^) P(E H2) P(H^)P(E H^) - P(H2)P(E H2) p(f) =.1^ P(E-)-P(F E^) (15) where P(F E ) denotes the probability of F given the occurrence of Event E. For example, the probabilities of Events Ej, E2, and are 0,50, 0.30 and 0.10, and the probabilities of Event F, given the occurrence of Events Ej, E^, and E^, are 0.08, 0.05 and 0.02. Applying Equation 15 yields: P(F) = (0.50)(0.08)+(0.30)(0.05)+(0.10)(0.02) = 0.057. (16) BAYES' THEOREM Bayes' Theorem is also called the Theorem of Inverse Probability because it provides the probability that an event that has already occurred might have occurred in a particular way, which is tantamount to reasoning from effect to cause. Stated more precisely, Bayes' Theorem provides probabilities of occurrence for hypothetical causes by modifying a priori probabilities for events on the basis of trial evidence, expert opinion, or other measures of a posteriori probability. If a set of events is mutually exclusive and collectively exhaustive, Bayes' Theorem can be expressed as follows: P(H.) J P(E H.) J Figure 1. Probability Tree Diagram for Bayes' Theorem P(Hj)P(E Hj) The applications that follow are increasingly complex to ensure a firm grasp of principles and appreciation of the potential offered by Bayes' Theorem toward solving real-world problems. The reader is urged to consider the conclusions one might reach without the causal insight the theorem provides. Cost Estimating APPLICATIONS A government agency wishes to procure a new data processing facility and needs to know the most likely turn-key cost for budgeting. Data given in Table II relate cost hypotheses to their respective probabilities as estimated by the agency's staff and concluded from informal quotations by potential suppliers. P(Hj E) P(Hj)P(E Hj) n - ^Z^P(Hi)P(E Hi) (17) TABLE II. Cost Estimating Example Data where P(Hj E) denotes the probability that hypothesis Hj caused the event out of all possible hypothetical causes H^, given the event has occurred. It is convenient to view Bayes' Theorem in the probability tree form shown in Figure 1. The probability that Event E was reached by the j-th branch of the tree, given that it was reached through one of its n branches, is the ratio of the probability associated with the j-th branch to the sum of the probabilities associated with all n branches of the tree. Cost Hypotheses Staff Probability Supplier Probability $5,000,000 0.60 0.40 $4,500,000 0.25 0.30 $4,000,000 0.10 0.20 $3,500,000 0.05 0.10 The concern is the likelihood that the facility might cost $5,000,000. The issue of staff 46
conservatism and supplier optimism can be resolved by using Bayes' Theorem. Applying Equation 17 for the probability of the facility costing $5,000,000 yields: P(5,OOO,000lQi) = (0.60)(0.40)/[(0.60)(0.A0) +{0.25)(0.30)+(0.10)(0.20) +(0.05)(0.10)] =0.71 (18) where P(5,000,000 Q^) denotes the probability of the facility costing $5,000,000 given the supplier quotations. The answer is more conservative than the most conservative staff opinion (Table I I ). Game of Chance Each of four urns is to be sampled once by the player. The first urn contains one red ball and two white balls, the second one red ball and three white balls, the third one red ball and four white balls, and the fourth one red ball and five white balls. The selection of urns is equally probable. and given that a defective component is used, what are the probabilities that it came from the respective vendors? The a priori probabilities of using a component from the respective vendors, V^, are given as: P(Vi)=0.45, P(V2)=0.30, P(V3)=0.25. (22) The conditional probabilities of using a defective component, D, given components from the respective vendors, V, are also derived: P(Di)=0.06, P(D2)=0.03, P(D3)=0.02. (23) Applying Equation 15 yields the a priori probability of using a defective component, D: P(D) = (0.45)(0.06)+(0.30)(0.03)+(0.25)(0.02) = 0.041. (24) The a priori probabilities of picking a red ball from the respective urns are: Pl = f, P2 = f P3 i PA = i 5' 6, The player picked a red ball; what is the probability that the second urn, U2, was selected? Since the selections of urns are equally probable events, the marginal probability of selecting any urn, U^, is: (19) Given a defective component, D, the probability that it came from a particular vendor, V, can be derived using Equation 17: P(D1VI) = (0.45)(0.06)/[(0.45)(0.06) (25) =0.66 +(0.30)(0.03)+(0.25)(0.02)] P(D V2) = (0.30)(0.03)/[(0.45)(0.06) (26) =0.22 +(0.30)(0.03)+(0.25)(0.02)] P(D V3) = (0.25)(0.02)/[(0.45)(0.06 (27) +(0.30)(0.03)+(0.25)(0.02] =0.12. P(Ui) (20) Environmental Protection Applying Equation 17 for P(U2IR), which denotes the probability of having selected U2 given a red ball was picked, yields: P(U2lR) Defective Components 15 57 (21) Identical components are procured from three vendors for use in the next higher assembly of a production process. Forty-five percent of the components are procured from Vendor 1, 30 percent from Vendor 2, and 25 percent from Vendor 3. The respective percentages of defectives are 6, 3, and 2. What is the a priori probability of using a defective component in the next higher assembly. Pollution detection devices of the environmental protection agency of a certain state can detect excessive amounts of pollutants emitted by factories with a probability of 0.90, and probability of 0.20 that factories not exceeding limits will fail the test. The issue is whether to procure devices with a detection probability of 0.99, even though the increased sensitivity will increase the false alarm probability to 0.22, to apprehend raore violators of state statutes. It is assumed 30 percent of the factories in the state emit excessive pollutants. Figure 2 illustrates the probability tree approach to the problem. Hypotheses and H2 are the fractions of violators and non-violators, and Event E\ is the state of detecting excessive emissions; E2 is the state of detecting nonexcessive emissions and indicating excessive (i.e., false alarm). 47
0.90 (0.30)(0.90) = 0.27 INITIAL CONDITIONS 0.20 (O.70)(0.20) = 0.14 0.99 (0.30)(0.99) = 0.297 Figure 3. Exponential Probability Density Function for MTBF of 9,950 Hours PROPOSED CONDITIONS t is the probability that the time between successive failures is less than t for the given MTBF. (0.70)(0.22) = 0.154 Figure 2. Environmental Protection Example Under the initial conditions, the probability that a factory failing the test actually emits excessive pollutants is: The following expression is used to calculate the required MTBF from specified reliability and overhaul interval values: p = g-t/mtbf (30) where the constant e equals 2.718. 0.27 0.27 + 0.14 = 0.66. (28) For the example, applying Equation 30 yields: MTBF = 9,950 hours. (31) Under the proposed conditions, the probability that a factory failing the test actually emits excessive pollutants is: 0.297 0.297 + 0.154 = 0.66. (29) The same answers result from using Equation 17. The conclusion should be not to invest in more sensitive devices. Life Expectancy A classical example of applying Bayes' Theorem concerns the life expectancy of jet engines. Typically, reliability goals are expressed in terms of the required probability of failure-free operation over the time interval between successive engine overhauls. The MTBF of 9,950 hours serves as the nominal design value and might have a tolerance of plus or minus ten percent for values of 10,945 or 8,955 hours. The question is one of validating the specified value of MTBF without testing every engine over the overhaul interval, which decreases overall life expectancy. The approach to this problem consists of the following steps. Confidence factors are assigned to the MTBF tolerance limits using knowledge of parts variability, and engineering experience and judgment. The factors are used as a priori probabilities of realizing the respective MTBF values in the design, and are listed in Table I I I for the example. Consider a specified reliability of 0.999 and an overhaul interval of 100 hours. Reliability predictions are made using the exponential probability density function shown in Figure 3, wherein t is the overhaul interval value. The area under the curve to the right of t is the probability that the time between successive failures is equal to or greater than t for the given value of mean time between failures (MTBF). Similarly, the area under the curve to the left of 2. A limited number of prototypes are operated over the overhaul interval in accordance with a specified succeed-fail ratio. The conditional probabilities of successful trial are calculated given that the respective MTBF values were realized in the design. This step first requires the calculation of marginal probabilities of failure-free operation over time periods equal to the respective MTBF values. 48
TABLE I I I. Life Expectancy Example Data Where C" denotes the number of combinations possible of n items taken r at a time. MTBF Hours A Priori Probability P(MTBF) 8,995 0.10 9,950 0.80 10,945 0.10 4. Bayes' Theorem is then used to calculate the a posteriori probability of realizing the specified MTBF value in the design on the basis of the a priori probabilities from Step 1, given the conditional probabilities from Step 3 and a successful trial. In the example, n=5 and r=2, 1 and 0; applying Equation 30 yields: C^ = 1! + 5J + 1! = 10+5+1 = 16. 1 a (2!K3!) (1<)(U<) (0!)(5!) Next, marginal probabilities of failure-free operation over time periods equal to the respective MTBF values are calculated. Applying Equation 30 yields: ^8,995 = e-8.995/9',950 = o.40 (34) ^9,950 = e-9.950/9,950 = o.37 (35) ^10,945 = e-10.945/9,950 = o.33. (36) 5. The design is deemed acceptable if the specified succeed-fail ratio is achieved and the a posteriori-probabilities equal or exceed the a priori probabilities. Continuing the example, assume the succeedfail ratio is 5:2; that is, five engines are operated for 100 hours each and two or fewer failures constitute success. Table IV gives the sample space for the successful trial with G and B connoting good or bad results. TABLE IV. Life Expectancy Example Sample Space These values equal the area under the curve to the right of t in Figure 3, when t equals the respective periods. In addition, the areas under the curve to the left of t are the probabilities of failure in the respective periods, that is (1-P). These values are used to calculate the conditional probabilities of achieving the specified succeed-fail ratio, given the respective MTBF values, by applying Equation 15 in the following form: 2 n-r r P(E MTBF) = 2 (P) (1-P) s (37) r=o G G G G G G B G G B G G G G B B G G G B G G G B G G G B B G G G B G G G B G B G G B G G G B G G B G B G G G G G B B G G G G G B B B G B G G G G B G B B B G G G Note that there is one state for no failures, five states for one failure, and ten states for two failures for a total of 16 states. The generalized expression for the number of states in a sample space is: C? =, ^, (32) r! (n-r)! where P(E MTBF) denotes the conditional probabilities of Event E, in this case achieving the specified succeed-fail ratio, given the respective MTBF values, and s denotes the number of states in the sample space for the respective number of failures r. Applying the values frora Equations 34, 35, and 36, and number of states from Table IV, in Equation 37 yields : 1. For MTBF of 8,995 hours: P(E 8995)= (0.40)5(0.60)0(1) +(0.40)4(0.60)1(5) +(0.50)3(0.60)2)(10) = 0.32 (38) 2. For MTBF of 9,950 hours: P(E 9950) = (0.37)5(0.63)0(1) +(0.37)1(0.63)1(5) +(0.37)3(0.63)2(10) = 0.27 (39) 49
3. For MTBF of 10,945 hours: P(E 10945) = (0.33)5(0.67)0(1) + (0.33)'=^(0.67)1(5) +(0.33)3(0.67)2)(10 = 0.20. (40) The probability of the specified MTBF value being realized in the design, given that the specified succeed-fail ratio is achieved, can now be calculated by applying Bayes' Theorem as expressed in Equation 17: CONCLUSION Bayes' Theorem is at the heart of modern statistical decision theory. It is a powerful tool for reasoning from cause to effect, and, conversely, predicting the outcomes of alternative strategies. The theorem is particularly useful in "what-if" kinds of decisions. The quality of resultant decisions is sound from the viewpoint of both realism and mathematical rigor. P(9950 E) = (0.80)(0.27)/[(0.10)C0.32) +(0.80)(0.27)+(0.10)(0.27)] = 0.78. (41) This value is less than the 0.8 confidence factor required by engineering, and even if the specified succeed-fail ratio were achieved in the trial, the design should be deemed only marginally acceptable. The MTBF tolerance limits about the design goal should be tightened. 50