6 Pressure Sensor 1 Pressure Sensor 2 Pressure Sensor 3 Average Filter Moving Average Filter Hysteresis Switch Fig. 9. Structure of ressure control system. tionally, a moving average filter is alied in the next ste. After that, a hysteresis switch decides when the valve has to be oened or closed. The formulae defining the digital filters are shown in figure 10. The average filter weights all of the three ressure values equally to calculate the actual overall ressure value. The moving average filter uses the last five actual ressure values to calculate the smoothed average ressure value, whereby again all values are weighted equally. i[k]: ressure i at time k [k] : average ressure at time k a[k]: smoothed average ressure at time k average filter moving average filter k [ ] a[ k] The hysteresis loo of the switch is shown in figure 11. Although a ressure between 480 and 520 bar is otimal, the threshold values of the hysteresis have been set to 490 and 510 bar in order to have a 10 bar reserve at both ends. As the target ressure is 500 bar, a tolerance interval of +/- 10 bar has been established, before the valve is oened or closed, resectively Failure Behavior. In the next ste, the failure behavior must be defined. Therefore, the failure modes of the three comonents, that can be considered as tasks of the system, are secified. As it has been exlained earlier, we use extended etri nets to describe the failure behavior. Average Filter The failure behavior model of the average filter is shown in figure 12. The general structure illustrated in figure 5 has been used to model the failure behavior of that task. The qualities of the three ressure values 1, 2, and 3 are considered as inut. If any of these values has a relative error different from zero, the task is set to the failure mode Error, otherwise the task remains in the mode Normal. If the task is in the normal mode, the outut value has no error, therefore, the error attribute is set to 0 (The setting Valve 1 = -- ( 1[ k] + 2[ k] + 3[ k] ) = -- k [ i] 3 i = 0 Fig. 10. Formulae describing the digital filters. switch command [oen/close] oen close ressure [bar] Fig. 11. Hysteresis loo used for the hysteresis switch. enter Guards: g1 : 1.error>0 or 2.error>0 or 3.error>0 g2 : 1.error==0 and 2.error==0 and 3.error==0 Actions: a1 :.error = 1/3 (1.error + 2.error + 3.error) a2 :.error = currentfailuremode Error g1 [1,2,3] g2 Normal [1,2,3] Error Normal [1,2,3] a1 [1,2,3] getstate Fig. 12. Petri net secifying the failure behavior of the average filter. of the attribute validity has been neglected to kee the examle simle). In the failure mode Error the relative error of the outut ressure is calculated using the relative errors of the inut values, as it is secified by the formula shown in figure 13. The current failure mode is reresented by an additional token on the lace currentfailuremode, asithas been mentioned earlier. The transitions outside the box are required as interface and will be connected to transitions in the suerordinated net, as it has been exlained in section error = -- ( 3 1.error + 2.error + 3.error) Fig. 13. formula secifying error roagation of the average filter Moving Average Filter The moving average filter uses the last five values of the average filter, whereby every 2 milliseconds a new value is samled. The error roagation can be secified, in general, deending on the amount of regarded samles N and the samle eriod T, as it is shown in figure 14. The w = min 1, (.MTTC) N T a.error = w.error Fig. 14. Formula secifying error roagation of an moving average filter. quotient of the MTTC over the eriod T defines how many samles can be influenced by a fault. All other samles have no error (It is assumed that MTTO» N T ). Therefore, the ratio between the number of influenced samles and the number of all samles defines the weight w of the relative error. The according etri net for the filter is shown in figure 15. Again, the failure modes Normal and Error are defined. The normal mode is valid when the inut is correct, otherwise, the task is set to the error mode. In the normal mode, again, the error attribute of the outut quality is set to zero. In the error mode, the relative error of the outut value is calculated according to the formula shown in figure 14. To exress a ersistent error with the MTTC, the latter is set to the maximal ossible integer value (MAX- INT). If an error is ersistent, all samles are faulty, therefore, a must have the same relative error as. This fm a2 exit - 6 -

7 enter Guards: g1 :.error>0 g2 :.error==0 Actions: a1 : a.error = min(1, 1/5 * floor(.mttc / 2)) *.error a2 : a.error = 0 Error CurrentFailureMode g1 Normal g2 Normal getstate exit a Fig. 15. Petri net secifying the failure behavior of the moving average filter. requirement is met if the MTTC is set to MAXINT, asin that case the weight w always evaluates to 1, i.e. the error of is assigned to the error attribute of a. Error fm a1 a a2 a value of a is lower than the actual ressure: 520( 1 x) 510 => x = 1.9% value of a is higher than the actual ressure: 480( 1 + x) 510 => x = 6.25% Fig. 16. Maximal relative error for the uer threshold value Hysteresis Switch Before we define the etri net for the hysteresis switch, we examine its failure behavior. The hysteresis loo is illustrated in figure 11 and the consequences of too low or too high a ressure are shown in figure 8. Besides the normal oeration, wrong ressure values can result in a roblematic or dangerous situation, resectively. Obviously, it is reasonable to define the three failure modes Normal, Problem, and Danger. Now, we must examine which errors result in which failure mode. At first sight, it seems quite simle to define guards like.currentvalue > 520. However, our aroach is aliable at early stages of the develoment rocess, therefore, we do not assume to have absolute values or absolute errors available. For this reason, we regard the threshold values and use the available relative errors to obtain the absolute values. This is sufficient, as it is only necessary to consider the worst case. We must regard both threshold values and, since the relative error does not exress if the faulty value is higher or lower than the actual value, it is also necessary to cover both cases in the consideration. We want to calculate exemlarily the maximal relative error that is allowed to remain in the normal mode for the uer threshold value. The uer threshold value is 510 bar. Actually, it is only necessary to oen the valve when the ressure is higher than 520 bar. If the faulty value of a is lower than the actual ressure in the tank, it must be ensured, nevertheless, that the valve is oened at latest when the ressure is higher than 520 bar. That means, if the ressure is 520 bar, the current, faulty value of a must be at least 510 bar. According to the uer formula shown in figure 16, the relative error must therefore be lower than or equal to 1.9%. If the value of a is higher than the actual ressure, the valve might be oened too early. We demand that the ressure must be at least 480 bar before the valve is oened. The according maximal relative error is calculated using the lower formula of figure 16. The remaining maximal relative errors can be calculated in the same way. The etri net reresenting the resulting failure behavior of the hysteresis switch is shown in figure 17. As the outut of this task is a direct system outut (the command for the valve), it is only necessary to outreach the failure mode to obtain the influence on the system behavior. (a.error>1.9) and (a.error<=3.8) a a.error>3.8 a.error<=1.9 Problem Danger Normal currentfailuremode Fig. 17. Petri net secifying the failure behavior of the hysteresis switch Interdeendencies of tasks. So far, the failure behavior of three single tasks has been described. Now, it is necessary to combine the single etri nets in a suerordinated etri net to obtain the system failure behavior. The overall etri net secifying the failure behavior of the ressure control system is shown in figure 18. Mainly the system structure has been rebuilt. First, an instance of the etri net reresenting the average filter is created and three qualities of the ressure values can be injected, as it has been exlained in section 3.5. The current failure mode and the quality of the average ressure value are requested from the sub net describing the average filter. The quality of the average ressure value is used as inut for an instance of the etri net secifying the failure behavior of the moving average filter. The outut of this sub net is its current failure mode and the quality of the smoothed average ressure value, which, in turn, is used as inut for the hysteresis switch. As the outut of the hysteresis switch is a direct system outut, it is reasonable not to use an information quality as outut, but only the current failure mode. An analysis can be started by simulating the etri net. Errors can be injected using the injection transitions. We create an information quality object, which is referenced by a token, and ut this token on the lace leading to the resective transition. These information quality tokens can be either defined and laced manually on the inut laces, or automatically by a searate software. The injection and the roagation of the information qualities is then done automatically by a etri net simulator [10]. Placing the tokens manually has the disadvantage that the etri net simulator requires to create and to lace new information quality tokens for each analysis run. Furthermore, it is not ossible to change the injections during an analysis. However, it is one major advantage of our aroach that the exit getstate - 7 -

9 Created tokens: 1.valid=false; 1.error = 15%; 1.MTTC=4; 2.valid=false; 2.error = 6%; 2.MTTC=4; 3.valid=true; Error average filter Injection-Interfaces Inut Places moving average filter.error=7% hysteresis switch Fig. 20. Analysis of the influences of a transient sensor fault. the smoothed ressure value a is reduced to 2.8%. That means, the efficiency of the deending machines is reduced (see figure 17). It is obviously necessary to imrove the behavior. In the next scenario, we assume that the engineers of the lant roose two ossible imrovements: First, it is ossible to reduce the maximal time of ersistence of the radiation from 4 to 3.5 milliseconds. Second, it is ossible to shield the sensors, in consequence, the relative errors could be reduced to 12% and 4.8%, resectively, that means a reduction of the disturbance by 20%. If we use these values as inut for the analysis, we obtain the result, that the reduction of the time of disturbance to 3.5 milliseconds is sufficient to hold u the normal mode. A ersistence of the radiation of less than 4 milliseconds means that at most one samle is influenced. For that reason, the moving average filter comensates the relative error Ṡhielding the sensors as assumed above, however, is not sufficient. Desite the reduced relative errors, the overall ressure has still a relative error of 5.6%. The moving average filter only reduces the error to 2.24%, what is not sufficient to remain in the normal mode. 5 Conclusion: Analyzing large distributed systems Error a a.error= 2.8% Problem In this aer, we introduced a flexible, scalable, and extendable aroach for failure behavior analysis. In comarison to existing analyses, like FTA or FMEA, our analysis yields more sohisticated results which enable the analyst to understand the system behavior in the case of faults. In the alication examle, we demonstrated the alicability of our analysis. However, we limited the comlexity of the examle to revent going beyond the scoe of a aer. Our research focuses on large, distributed embedded systems. Therefore, our analysis has been develoed for those systems. Mastering the comlexity of those systems is one major roblem. For that reason, the scalability of our aroach is of crucial imortance. A further essential asect is the ossibility to automatically generate major arts of the etri nets. For examle, it is even ossible to generate a simle failure behavior that sets the validity attribute of the outut information qualities to false, if any inut quality is invalid. For that reason, similar results as they are obtained by FMEA are yielded automatically without any additional effort of the analyst. Although the generated etri nets define only a very coarse aroximation of the actual failure behavior, it is, in contrast to a common FMEA, ossible to examine the effects of fault combinations very easily or even automatically. A further asect that is imortant for the analysis of large systems is reuse. If tasks or comonents are reused in other rojects, the etri nets defining their failure behavior, can be reused, too. If distributed systems ought to be analyzed, our aroach has two further advantages. First, the artitioning of the system is suorted, as one can examine which tasks should be assigned to which artition so that a failure of one artition has the least influence on the overall system behavior. Second, the effects of missing or delayed information, interchanged between system artitions, can be analyzed. The delay of an information can be assigned to its quality and the effects on single tasks can be modelled exlicitly with etri nets. The effect on the overall system behavior is obtained automatically, as the error roagation is rovided by the etri net simulator. 6 References [1] G. Booch, I. Jacobson, J. Rumbaugh, The Unified Modelling Language User Guide, Addison Wesley Longman, Reading, MA [2] A. Metzger, S. Queins, A Reuse- and Prototying-based Aroach for the Secification of Building Automation Systems, OMER-2 Worksho, Hersching, Germany, 2001 [3] IEC ( ), Fault Tree Analysis, International Electrotechnical Commission, Geneva, Switzerland, 1990 [4] K. Yang, C. K. Kaur, Customer Driven Reliability: Integration Of QFD And Robust Desing, Proceedings IEEE Annual Reliability and Maintainability Symosium, 1997 [5] C. J. Price, N. S. Taylor, FMEA For Multile Failures, Proceedings IEEE Annual Reliability and Maintainability Symosium, 1998 [6] B. Berard, M. Bidoit, A. Finkel, F. Laroussinie, A. Petit, L.Petrucci, Ph. Schnoebelen, P. McKenzie, Systems and Software Verification, Sringer Verlag, Berlin, 2001 [7] H. Hermanns, Construction and Verification of Performance and Reliability Models, in Bulletin of the Euroean Association for Theoretical Comuter Science (EATCS), 2001 [8] H. Hermanns, J.P. Katoen, J. Meyer-Kayser and M. Siegle, A Markov chain model checker, Proceedings of Six International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Sringer Verlag, Berlin, 2001 [9] Olaf Kummer, Simulating Synchronous Channels and Net Instances, 5. Worksho on Algorithms and Tools for Petri Nets, 1998 [10]Olaf Kummer, Frank Wienberg, RENEW - The Reference Net Worksho, Petri Net Newsletter, No. 56,

