FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded by a real-tme operatng system (RTOS). An applcaton s typcally structured n sets of processes that share common hardware resources va the RTOS. Such archtectures present numerous advantages for software development by decouplng the applcaton software from the specfcs of the underlyng hardware. However, they also present challengng certfcaton problems. The RTOS s a hghly crtcal component of the overall avoncs system and must be certfed to the hghest levels of assurance. In addton, new software ntegraton ssues have to be addressed such as ensurng that the processes that share common resources wll satsfy ther performance requrements. Schedulng and synchronzaton servces are among the most crtcal servces an RTOS must provde. These servces can be partcularly subtle, and t s dffcult to obtan strong evdence that they wll perform properly n all crcumstances. Testng and nspecton are not suffcent for ths purpose. Asde from the pure problem of showng that the RTOS behaves as expected, a second ssue s to determne whether an applcaton that reles on the schedulng dscplne and uses the communcaton servces provded by the RTOS satsfes ts tmng requrements. A well-establshed theory of real-tme schedulng does exst, that should provde a sound foundaton to the development and valdaton of real-tme system. Yet, the abundant theoretcal work may not by tself provde the degree of assurance requred n safety-crtcal domans such as avoncs. The lterature has tradtonally reled on nformal approaches, and proofs often rely more on ntutve explanatons than precse, rgorous arguments. Snce the problems are complex and subtle, there are examples of erroneous results and flawed proofs n the lterature. In addton, real operatng systems are often rcher and more complex than consdered n the lterature. In ntegrated avoncs, RTOS must provde strong parttonng guarantees to prevent nterference between processes sharng common hardware resources. Complex systems relyng on faulttolerant rate-monotonc schedulng are beng bult [1] or kernels employng a nontrval mxng of statc and prorty-based schedulng are beng consdered [2]. The general theory rarely apples drectly but often has to be adapted to a specfc context. We examne how formal methods can help address these ssues. Formal methods can be used to develop very precse, complete, and rgorous proofs that theoretcal results are correct and properly appled to a partcular context. Formal modelng and verfcaton can provde very strong evdence that an RTOS satsfes crtcal schedulng and synchronzaton propertes. As an llustraton, we dscuss the formalzaton and verfcaton of the prorty-celng protocol [3], a schedulng and synchronzaton protocol used n common real-tme operatng systems. We then dscuss extensons of ths work to more complex types of kernels and the benefts of formal methods n real-tme schedulng problems. Real-Tme Schedulng Issues n Safety-Crtcal Applcatons In tradtonal real-tme systems, an applcaton s decomposed nto a set of processes or tasks that run concurrently on one of more processors. A schedulng polcy determnes how the processng resources are shared between the dfferent tasks. Typcal RTOSs use fxed-prorty preemptve schedulng: each task s assgned a fxed prorty, and, at all tme, the task of hghest prorty that s ready to execute s allocated the processor. Other polces are also possble, such as earlest-deadlnefrst schedulng, table-drven schedulng, or other more complex schemes. The fundamental problem n ths context s to determne condtons under whch a gven set of 1
tasks satsfes ts tmng requrements. Many theoretcal results provde such guarantees for dfferent classes of systems, relyng on dfferent schedulng polces, and wth dfferent assumptons about tasks and tmng constrants. In partcular, fxed-prorty preemptve schedulng has been extensvely studed and schedulablty results are known for many types of task sets and tmng constrants. A smple stuaton s when the tasks are ndependent and perodc. In such a case, n tasks τ 1,,τ n are each characterzed by a computaton cost C and a perod T. Task τ s actvated at successve tmes t0, t0 + T, t0 + 2T, and so forth, and each nvocaton must termnate wthn a oneperod nterval. In other words, τ must be allocated a total amount of C tme unts of processng tme n every nterval [ t 0 + kt, t0 + kt + T ). In fxedprorty schedulng, t s known that, under the ratemonotonc prorty assgnment 1, the problem s feasble f the followng condton s satsfed n = 1 C T n(2 1/ n 1). Lu and Layland proved ths property n 1973 [4]. They also establshed other mportant results, such as the fact that the rate-monotonc prorty assgnment s optmal. The condton above s based on a worst-case analyss and many task sets that do not satsfy the nequalty are stll feasble. A better analyss technque conssts of computng the worstcase response tme of a task. The procedure s sketched n Fgure 1, where H s the set of tasks of hgher prorty than τ. The sequence ( W ) s computed untl ether t converges to a fxed pont k W T or t reaches a value W that s greater than T. In the former case, every nvocaton of task τ s guaranteed to termnate wthn a delay W. In the latter case, τ cannot be guaranteed to meet ts deadlnes. The approach sketched n Fgure 1 (as well as other approaches) generalze to many other task models, ncludng models where tasks can communcate wth each other and where deadlnes k are dfferent from the task perods [5,6]. Wth such analyss technques, the theory of real-tme schedulng, especally n the fxed-prorty context, s now mature and applcable to real-world systems. W W 0 k+ 1 = C = C + j H W C j T j Fgure 1. Computng the Worst-Case Response Tme of a Task However, n safety-crtcal applcatons, hgh assurance s requred, and the followng ssues have to be examned: Whch guarantees do we have that the results publshed n the lterature are actually correct? Whch guarantees do we have that a gven RTOS actually satsfes the assumptons of an abstract schedulng model? Correctness Issues Although the theory of real-tme schedulng can be developed usng rgorous mathematcs, results are often presented less formally. The assumptons made are not always carefully specfed. The proofs are often mprecse and based on ntutve explanatons rather than rgorous arguments. As the problems are subtle and complex, there are examples of erroneous results, and ncomplete or flawed proofs n the lterature. Even Lu and Layland s essental results rely on many propertes that all look ntutvely reasonable but are not rgorously proven [4]. More precse proofs of the same results have been gven snce, but there s an obvous rsk n relyng on propertes whose proof s based on ther authors ntuton. Ths rsk s exacerbated when one moves away from the smplest schedulng models. In many systems that are relevant for avoncs, process schedulng nteracts wth other crtcal mechansms, such as those supportng nter-process communcaton, parttonng, or fault-tolerance. In such contexts, our ntuton s much more lkely to be wrong than n the basc, fxed-prorty model. k j 1 The tasks wth lower perod are gven hgher prorty. 2
For example, Ghosh et al. [7] present a schedulng approach ntended to tolerate transent faults by reexecutng faulty tasks. Generalzng Lu and Layland s results, they gve condtons under whch perodc tasks are feasble usng ths scheme. In a subsequent paper, the same authors found a flaw n ther ntal analyss and proposed a dfferent fault-tolerant schedulng scheme [8]. Unfortunately, as dscovered by Snha and Sur [9], ther analyss of the new scheme s also flawed; there are task sets that satsfy the condtons n Ghosh et al. [8], but where deadlnes are not met f a fault occurs. Ths example llustrates the dffculty of correctly analyzng complex schedulng polces usng only nformal models and proofs. It also shows that the nformal revew process used n academc research may fal to spot serous errors. The example s also very relevant to the avoncs communty snce the fault-tolerant schedulng scheme of Ghosh et al. has been mplemented n an RTOS specfcally desgned for avoncs applcatons [1]. (The approach mght work n ths case because the RTOS n queston only supports harmonc task sets.) Valdty of Models Most results n real-tme schedulng are developed usng very abstract scheduler models. Typcally, only a few assumptons are made about the scheduler, such as whch job s actve when several are ready to execute. To apply theoretcal results n crtcal applcatons, one needs to ensure that the relevant assumptons are satsfed by the system at hand. Ths can be dffcult as there s usually a large gap between the abstract models used to derve schedulng results and the real schedulng algorthms used n an RTOS. Part of the dffcultes s due to the complex nteracton between schedulng and other servces provded by the RTOS. As a mnmum, nter-process communcaton requres synchronzaton servces that may cause a process to wat and be delayed by other processes. Other features of an RTOS, such as nterrupt handlng or ts clock frequency also have an mpact on task executon. Because of fault-tolerance requrements, RTOS for crtcal applcatons may also provde features, such as temporal parttonng [2], that mpose constrants on how a processng tme s allocated to tasks. Such systems do not follow exactly the smple prorty-based allocaton polcy that s the most commonly studed n theory. Furthermore, because of lmtatons of the smple approaches, more complex schedulng strateges are beng proposed. For example, maxmal-urgencyfrst schedulng [10] s a complex combnaton of prorty-based and deadlne-based schedulng. The approach ntends to ensure that crtcal tasks never mss a deadlne, whle achevng hgh processor utlzaton. Ths new approach has undenable performance advantages over more classc schedulng strateges and may be adopted by practcal systems, but t s also more complex to mplement and to analyze. To the best of our knowledge, ths approach has been evaluated only emprcally [10,11]. Even though many types of schedulng dscplnes and task sets have been nvestgated theoretcally, real systems may not always match any of the theoretcal models. Stll, the lterature provdes general analyss methodologes that can often be adapted to new contexts. The problem s to obtan hgh assurance that such general methods have been properly appled to the gven system. A Formal Analyss Methodology for Real-Tme Schedulng The lterature on real-tme schedulng contans many results that should provde sound bases for the engneerng of real-tme systems. For such results to be appled n safety-crtcal applcatons, one must obtan very hgh confdence that they are correct. Furthermore, as new schedulng strateges are ntroduced, there s a need for rapdly adaptng schedulng results to complex systems that do not necessarly match tradtonal models. As dscussed prevously, very rgorous approaches are necessary to catch potental errors or omssons n models and n proofs. Formal methods and tool-asssted verfcaton can provde the requred degree of precson and rgor, and as dscussed n the sequel, can be appled effectvely to nontrval examples. We propose a general methodology for modelng real-tme schedulers, provng that they satsfy crtcal propertes, and dervng schedulng propertes for dfferent types of tasks. Our objectve s an approach that supports 3
both the verfcaton of qualtatve propertes related to synchronzaton (e.g., the absence of deadlocks) and the dervaton of hgh-level real-tme results (e.g., condtons under whch deadlnes are met). To support the approach, we use the PVS theorem prover [12]. In partcular, we have developed a large lbrary of basc notons and results that are useful for modelng systems and verfyng hgh-level schedulng propertes. Ths PVS lbrary also ntroduces the basc concepts that are central to real-tme schedulng analyss. Applcaton to a specfc schedulng protocol conssts of buldng a state-machne model of the scheduler, examnng the executon traces of ths model, and relatng these traces to the basc notons and results provded by the lbrary. Support for Tmng Analyss Although schedulng algorthm can vary a lot, many generc notons and basc facts are useful for reasonng about dfferent approaches. The lterature also provdes some general analyss technques, such as computng worst-case response tmes, that are applcable n many contexts. To take advantage of these exstng results, we have developed a lbrary of PVS theores that formalze and prove many useful propertes. The lbrary s based on the basc concept of a dscrete-tme abstract schedule that represents the allocaton of a sngle resource (e.g., a processor) to dfferent jobs. Gven a set of jobs J, a schedule s an nfnte sequence u = u, u1,, u, 0 t where u t ndcates whch job, f any, owns the resource at tme t. Ether u t = j and j J s the job that s actve at tme t, or u t = f there s no job actve at that tme. Analyss of tmng propertes can then be based on measurng how much tme has been allocated to a partcular job or sets of jobs durng. Gven a set E J, the PVS functon process_tme(u, t1, t2, E) gves the amount of tme allocated to jobs of E n the nterval [ t 1, t2 ). A smlar functon gves the amount of dle tme n an nterval. These basc notons and ther propertes are mplctly used n many theoretcal works on real-tme schedulng. A set of PVS theores contans many useful propertes of these basc functons. Another part of our PVS lbrares ntroduces results that support tmng analyss va the computaton of worst-case response tmes. For example, we have formalzed n PVS a generalzaton of the algorthm descrbed n Fgure 1. These results are very useful snce they can be appled n general contexts. In partcular they are applcable to many prorty-based preemptve schemes. Modelng Schedulers To apply the results from the lbrary to a partcular system, the frst step s to buld a model of the schedulng algorthm used. A natural approach s to use state-machne models. Such models have the advantage of provdng operatonal specfcatons and are easy to understand, thus reducng the gap between mplementaton and model. There are also well-known verfcaton technques (some of whch can be fully automated) for provng propertes of such automata-based models. It s also mportant for our purpose to obtan models that convenently support tmng analyss. Snce tmng results depend as much on task characterstcs as on scheduler propertes, we need the ablty to ntroduce assumptons about sets of tasks. A common approach s to decompose a task nto a successon of jobs, each job beng characterzed by attrbutes such as ts dspatch tme and duraton. Tmng characterstcs of tasks can then be specfed as constrants on the length of jobs and the delays between successve jobs. Our modelng approach follows the same general prncples. We buld a state-machne model that s parameterzed by a fxed but arbtrary collecton of jobs. All the job attrbutes are assumed fxed and known n advance to smplfy the modelng. To obtan general models, we assume as lttle as possble about the jobs. Ths allows us to obtan results that are vald for very general classes of jobs and then largely ndependent of the type of tasks consdered. For example, synchronzaton propertes that are vald whatever the task characterstcs can be establshed n ths way. 4
For obtanng hgher-level schedulng results that depend on the tmng characterstcs of tasks, the general models can be specalzed by restrctng attenton to jobs that satsfy certan propertes. Ths amounts to nstantatng the state-machne model wth specfc parameters that satsfy adequate assumptons. Analyss of these specalzed nstances can stll rely on the propertes that are nherted from the generc model. From State Machnes to Abstract Schedules Once a state-machne specfcaton s constructed, we can study ts real-tme propertes by examnng ts executon traces. More precsely, the state-machne model s parameterzed by a set of jobs J wth attrbutes such as ther startng tme, ther prorty, and ther duraton. The traces of the machne are sequences of state-transtons performed by the system for ths partcular set of jobs. From such traces, t s trval to construct abstract schedules that can be analyzed usng the support lbrary. As prevously, tmng propertes can be very general or vald only for specal nstances of the job parameters that satsfy relevant assumptons. An Example: The Prorty-Celng Protocol We have performed a full PVS formalzaton and verfcaton of a nontrval schedulng algorthm. Ths shows that precse and rgorous analyss of farly complex schedulng approaches can be performed wth modern theorem provers. Our case study was the prorty-celng protocol, ntroduced by Sha et al. [3]. Ths protocol s a synchronzaton algorthm for real-tme systems that employ fxed-prorty preemptve schedulng. In ts basc form, the protocol s used to control access to common resources, such as shared varables or I/O devces, n mono-processor systems. Access to crtcal sectons s controlled va bnary semaphores. Ths protocol has many nterestng propertes: t ensures mutual excluson and absence of deadlocks, and t guarantees that blockng delays are mnmal. The latter property s essental n realtme applcatons. It means that a process P requestng access to a resource S wll wat only for a bounded tme before S becomes avalable. The protocol s also mportant n practce: the prortycelng protocol or one of ts varant s used n most commercal RTOSs. As s often the case n the lterature, the descrpton of the protocol gven by ts authors s qute nformal and mprecse. The key propertes are also proved n a farly nformal manner. As the protocol s subtle and reles on complex prortyadjustment rules, an nformal specfcaton s open to msnterpretaton. The prorty-adjustment rules and the nteractons between mutual excluson and schedulng make the detals of the protocol dffcult to understand and qute challengng to verfy. Our objectve was to develop complete and precse specfcatons of the protocol, and obtan rgorous proofs that the assocated results are vald. We wanted to prove both key synchronzaton propertes such as mutual excluson and absence of deadlocks, and sgnfcant schedulablty results. Detals of the PVS developments and verfcatons are avalable elsewhere [13]. We used the general methodology descrbed prevously: We constructed a parameterzed state-machne model of the prorty-celng protocol that specfes how semaphores are allocated to jobs and how jobs can be actvated n a gven state. We then proved that all reachable states of the protocol satsfy nvarant propertes that ensure mutual excluson, absence of deadlocks, and mply that at most one job can block another. The proofs use standard technques based on nducton. In a second step, we examned the executon traces and schedules that the protocol model can produce. From the basc state-nvarant propertes, we obtaned general bounds on blockng delays and on the processng tme allocated to jobs. These propertes are vald for arbtrary collectons of jobs and are the bass for further schedulng analyss. The proofs used the noton of schedules and the varous results developed n the support lbrary. We then appled the general results to derve a well-known schedulablty crteron for sporadc tasks [5,6]. A sporadc task can be thought of as a sequence of jobs of the same prorty, separated by a mnmal nterarrval delay. Sporadc tasks are useful for modelng both strctly perodc computatons and nterrupt-drven actvtes wth a 5
lmted nterrupt frequency. Knowng bounds on the length of each job of a task and the length of crtcal sectons of other jobs, one can compute the worstcase response tme for jobs of a task τ, usng an teratve algorthm smlar to the one of Fgure 1. From ths worst-case executon tme, t s easy to determne f a task meet ts deadlnes. The case study shows that a full formal specfcaton and verfcaton of nontrval schedulng mechansms and propertes s feasble. In the same framework, we establshed both lowlevel synchronzaton propertes and hgh-level schedulablty propertes. All the developments were performed wth tool support, usng the PVS specfcaton and verfcaton system. Ths provdes very hgh assurance of correctness and ensures a complete and rgorous analyss. The whole PVS formalzaton requred between two and three person months of effort, part of whch was spent on developng the general support lbrares. The formalzaton contans 417 lemmas and theorems, for around 2500 lnes of specfcatons. The support lbrary and other reusable results that are ndependent of the prorty-celng protocol represent around 40% of the developments (1000 lnes). In addton to gvng very strong evdence that the protocol and the assocated schedulng test are correct, the formalzaton also provded other useful nsghts: Our specfcaton of the protocol s more precse and smpler than the tradtonal nformal descrpton. In partcular, the protocol can be entrely specfed wthout any prorty adjustment by usng a slghtly modfed rule for selectng actve jobs. Ths may pont to new ways of mplementng the protocol and was essental for smplfyng the verfcaton. Some new results emerged from the analyss, such as the fact that the protocol works under weaker assumptons than orgnally made by ts authors. Sha et al. assume that crtcal secton are properly nested [3], but ths requrement s actually unnecessary. All the mportant propertes are satsfed even f crtcal sectons overlap. Our PVS proof of the schedulablty result for sporadc tasks s based on the noton of busy perods, but s smpler and more drect than the classc proofs. Many nformal proofs of smlar results refer to Lu and Layland s theorem 1, whch states that the worst-case response tme for a task τ occurs when all tasks of hgher prorty than τ are dspatched at the same tme as τ [4]. The worst-case scenaro s then when all tasks are released at the same tme, called a crtcal nstant. Informal proofs are usually based on fndng the worst-case scenaro for a gven context and determnng response tmes n ths worst case. However, provng that the dentfed scenaro s actually the worst possble case can be dffcult 2. Our proof shows that lookng for such worst-case scenaros s actually an unnecessary detour. Concluson A sound theory of real-tme schedulng s necessary to support the engneerng of real-tme software applcatons. Developng schedulng results usng formal methods helps provde the hgh level of assurance requred n safety-crtcal domans. We propose a formalzaton methodology relyng on state-machne models and on a lbrary of commonly applcable propertes. A case study has shown that rgorous and a detaled verfcaton of nontrval schedulng results can be performed wthn reasonable tme lmts, usng modern theorem-provng tools such as PVS. It s stll necessary to develop and extend such verfcaton efforts to the ncreasngly complex schedulng algorthms that may one day be used n safety-crtcal applcatons. Other aspects relevant to the avoncs doman reman nsuffcently explored by the formal methods communty. For example, lttle has been done n the analyss of dstrbuted real-tme systems, where dffcult problems mxng communcaton, processng, faulttolerance must be addressed. Formal methods are especally valuable n such complex settngs, where nformal proofs guded by ntuton are nsuffcent and where other valdaton approaches such as testng are ncomplete. Formal methods should become an essental tool for valdatng the most crtcal aspects of real-tme systems, such as the 2 The proof of theorem 1 gven by Lu and Layland s partcularly unconvncng. 6
parttonng mechansms requred for fault solaton n ntegrated avoncs. Real-tme schedulng problems are an deal applcaton area for formal methods snce they are subtle and complex, must be certfed to the hghest degrees of assurance for supportng crtcal applcatons, and thus requre very precse, detaled, and rgorous verfcaton. Acknowledgements The work descrbed n ths paper was partally funded by the Defense Advanced Research Project Agency and by the Ar Force Research Laboratory under contract F30602-97-C-0040. References [1] Dong, Lbn, et al., 1999, Implementaton of a Transent-Fault-Tolerance Scheme on DEOS, In Proceedngs of the Real-Tme Technology and Applcaton Symposum (RTAS), Vancouver, Canada. [2] ARINC Specfcaton 653, Avoncs Applcaton Software Interface, Annapols, MD. [3] Sha, L., R. Rakjumar, and J. P. Lehoczky, 1990, Prorty Inhertance Protocols: An Approach to Real-Tme Synchronzaton, IEEE Transactons on Computers, Vol. 39, No. 9, pp. 1175-1185. [4] Lu, C. L. and James W. Layland, 1973, Schedulng Algorthms for Multprogrammng n a Hard-Real-Tme Envronment, Journal of the ACM, Vol. 20, No 1, pp. 46-61. [5] Audsley, N. C., A. Burns, M. F. Rchardson, and A. J. Wellngs, 1991, Hard Real-Tme Schedulng: The Deadlne Monotonc Approach, Proceedngs of the 8 th IEEE Workshop on Real-Tme Operatng Systems and Software, Atlanta, GA, pp. 127-132. [6] Sha, L., R. Rajkumar, and S. Sathaye, 1994, Generalzed Rate-Monotonc Schedulng Theory: A Framework for Developng Real-Tme Systems, Proceedngs of the IEEE, Vol. 82, No. 1, pp. 68-82. [7] Ghosh, S., R. Melhem, and D. Mossé, 1997, Fault-Tolerant Rate-Monotonc Schedulng, In Dependable Computng for Crtcal Applcatons (DCCA-6), IEEE Computer Socety. [8] Ghosh, S., R. Melhem, D. Mossé, and J. Sarma, 1998, Fault-Tolerant Rate-Monotonc Schedulng, Real-Tme Systems, Vol. 15, Kluwer Academc Publsher, pp. 149-181. [9] Snha, P. and N. Sur, 1999, On the Use of Formal Technques for Analyzng Dependable Real-Tme Protocols, In Proceedngs of the 20 th IEEE Real-Tme Systems Symposum (RTSS-99), Phoenx, AZ. [10] Steward, D. and P. Khosla, 1991, Real-Tme Schedulng of Sensor-Based Control Systems, Proceedngs of the 8 th IEEE Workshop on Real- Tme Operatng Systems and Software, Atlanta, GA, pp. 144-150. [11] Gll, C., D. Levne, and D. Schmdt, 2000, The Desgn and Performance of a Real-Tme CORBA Schedulng Servce, Internatonal Journal of Tme- Crtcal Computng Systems, Specal Issue on Real- Tme Mddleware, http://www.cs.wustl.edu/~schmdt/dynamc.ps.gz. [12] Owre, S., J. Rushby, N. Shankar, and F. von Henke, 1995, Formal Verfcaton for Fault- Tolerant Archtectures: Prolegomena to the Desgn of PVS, IEEE Transactons on Software Engneerng, Vol. 21, No 2, pp. 107-125. [13] Dutertre, B., 1999, The Prorty-Celng Protocol: Formalzaton and Analyss usng PVS, Techncal Report, System Desgn Laboratory, SRI Internatonal, http://www.sdl.sr.com/dsa/publs/pro-celng.html. 7