Parallel-Task Scheduling on Multiple Resources

Parallel-Task Sheuling on Multiple Resoures Mike Holenerski, Reiner J. Bril an Johan J. Lukkien Department of Mathematis an Computer Siene, Tehnishe Universiteit Einhoven Den Doleh 2, 5600 AZ Einhoven, The Netherlans Astrat This paper aresses the prolem of sheuling perioi parallel tasks on a multi-resoure platform, where tasks have real-time onstraints. The goal is to exploit the inherent parallelism of a platform omprise of multiple heterogeneous resoures. A resoure moel is propose, whih astrats the key properties of any heterogeneous resoure from a sheuling perspetive. A new sheuling algorithm alle PSRP is presente, whih refines MSRP. The sheulaility analysis for PSRP is presente. The enefits of PSRP are emonstrate y means of an example appliation showing that PSRP inee exploits the availale onurreny in heterogeneous real-time systems. I. INTRODUCTION Moern omputers onsist of several proessing units, onnete via one or more interonnets to a memory hierarhy, auxiliary proessors an other evies. A simple approah to sharing suh platforms etween several appliations treats the mahine as a single resoure: the task having aess to the proessor has also impliitly aess to all other resoures, suh as a us, memory, or network. Consequently, only a single task is allowe to exeute at a time. On the one han, this approah avois the omplexity of fine-graine sheuling of multiple resoures. On the other han, it prevents tasks with inepenent resoure requirements to exeute onurrently an thus use the availale resoures more effiiently. For example, a vieo proessing task requiring the proessor an operating on the proessor s loal memory an exeute onurrently with a DMA transfer task moving ata etween the gloal memory an the network interfae. In this paper we assume that a task represents workloa whih oes not neessarily require a proessor. With the avent of multiproessor platforms new sheuling algorithms have een evise aiming at exploiting some of the availale onurreny. However, they are again limite to tasks whih exeute on one proessor at a time. In this paper we aress the prolem of sheuling tasks on a multiproessor platform, where a task an exeute on several proessors at the same time. Moreover, a task may also speify requirements for other heterogeneous resoures, suh as a us, igital signal proessor, share memory variale, et. In this respet our prolem is relate to parallel task sheuling on multiple resoures, where a task may exeute on several proessors at the same time. Parallel-task sheuling was originally investigate in the ontext of large mainframe omputers without real-time onstraints [1]. When threas elonging to the same task exeute on multiple proessors an ommuniate via share memory, then it is often esirale to sheule these threas at the same time (alle gang sheuling [1]), in orer to avoi invaliating the share memory (e.g. L2 ahe) y swithing in threas of other tasks. Also, simultaneous sheuling of threas whih interat frequently will prevent the otherwise sequential exeution ue to synhronization points an the large ontext swithing overheas [1]. Parallel-task sheuling is espeially esire in ata intensive appliations (e.g. multimeia proessing), where multithreae tasks operate on the same ata (e.g. a vieo frame), performing ifferent funtions at the same rate [2]. To the est of our knowlege, existing literature on preemptive parallel task sheuling with real-time onstraints has only onsiere inepenent tasks. In this paper we present a fixe-priority preemptive multiresoure sheuling algorithm for parallel tasks with real-time onstraints. Prolem esription Current multiproessor synhronization protools only onsier tasks whih exeute on a single proessor at a time. They are not suitale for synhronizing parallel tasks, whih may exeute on several proessors at a time an share resoures. A simple approah to sheuling suh tasks on a platform omprise of multiple heterogeneous resoures is to ollapse all the proessors into one virtual proessor an use uniproessor sheuling. However, this will result in at most one task exeuting at a time. Our goal in this paper is to provie a sheuling algorithm for parallel tasks with real-time onstraints whih an exploit the inherent parallelism of a platform omprise of multiple heterogeneous resoures. Contriutions In this paper we propose a new resoure moel, whih astrats the key properties of ifferent kins of resoures from a sheuling perspetive. We then present a new partitione parallel-task sheuling algorithm alle Parallel-SRP (PSRP), whih generalizes MSRP [3] for multiproessors, an the orresponing sheulaility analysis for the prolem of multi-resoure sheuling of parallel tasks with real-time onstraints. We show that the algorithm is ealokfree, erive a maximum oun on loking, an use this oun as a asis for a sheulaility test. We present an example whih emonstrates the enefits of PSRP. Outline The remainer of this paper is struture as follows. Setion II isusses the relate work. The system moel is introue in Setion III, followe y a reap of MSRP in Setion IV. PSRP is presente in Setion V, followe y its analysis in Setion VI. An example evaluating PSRP is presente in Setion VII an a isussion of PSRP in Setion VIII. Setion IX onlues this paper.

II. RELATED WORK To the est of our knowlege our work is the first to onsier parallel task sheuling on multiple resoures an real-time onstraints. In this setion we isuss the relate literature from the omains of multiproessor sheuling with share resoures an multiproessor sheuling of parallel tasks. A. Multiproessor sheuling with share resoures When tasks share non-preemptive resoures, they may lok when the resoure they are trying to aess is alreay loke y another task. Suh onflits may e resolve offline y means of a tale-riven sheule, or uring runtime y means of synhronization protools. Dijkstra [4] presents one of the earlier synhronization protools for multiproessors, alle the Banker s algorithm. The algorithm fouses on avoiing ealok when several onurrently exeuting tasks share a ommon multi-unit nonpreemptive resoure, ut it oes not provie any real-time guarantees. Tasks o not have priorities nor timing onstraints (esies terminating in finite amount of time), an eah task is assume to exeute on its own proessor. They may aquire an release the units of the share resoure in any orer, as long as the total numer of laime units oes not exee a speifie maximum, an as long as they release all laime units upon ompletion. [5] presents a generalization of the Banker s algorithm to several non-preemptive multi-unit resoures. More reently, existing real-time synhronization protools for uniproessors have een extene to multiproessors. They fous on synhronizing aess to gloal resoures, whih are resoures aesse from tasks exeuting on ifferent proessors. There are two main approahes for hanling gloal resoures: when a task wants to aess a gloal resoure whih is alreay loke y another task exeuting on a ifferent proessor, the task may e (i) suspene, leaving the proessor availale for other tasks to exeute, or (ii) spin-lok, holing the proessor in reserve. [6] takes the suspension ase approah an presents the Multiproessor Priority Ceiling Protool (MPCP) an Distriute Priority Ceiling Protool (DPCP) (for istriute memory systems). [3] takes the spinlok approah an presents the Multiproessor Stak Resoure Poliy (MSRP). All three protools assume partitione EDF sheuling. The Flexile Multiproessor Loking Protool (FMLP) y [7] an e regare as a omination of MPCP an MSRP an an e applie to oth partitione an gloal EDF multiproessor sheuling. The authors in [8], [9], [10] investigate the performane penalties etween various spin-lok an suspension ase protools (MPCP, DPCP, MSRP an FMLP) an onlue that spin-lok ase approahes inur smaller sheuling penalty than suspension ase, espeially for short ritial setions. The authors in [11] exten the original suspension-ase MPCP esription with spin loking, ompare the two implementations an show the opposite, i.e. that for low preemption overheas an long ritial setions the suspension-ase approahes perform etter, while in other settings they perform similar. The authors in [7] laim that FMLP outperforms MSRP, y showing that FMLP an sheule more task sets than MSRP. They assume freeom in partitioning the task set, i.e. that tasks may e assigne to aritrary proessors, an exploit this assumption to sheule task sets whih are not sheulale uner MSRP. Aritrary partitioning, however, may not neessarily hol for heterogeneous systems, where ifferent proessors may provie ifferent funtionality. Our PSRP algorithm is spin-lok ase. The avantage of hoosing this approah is simpler esign an analysis, ompare to a suspension ase approah. We ase our algorithm on MSRP, as it suits our moel etter, in the sense that given the partiular resoure requirements of tasks we annot exploit the avantages of FMLP. Neste ritial setions an lea to ealok. MSRP an MPCP expliitly fori neste gloal ritial setions. FMLP supports neste ritial setions y means of resoure groups. Two resoures elong to the same resoure group iff there exists a task whih requests oth resoures at the same time. Before a task an lok a resoure r it must first aquire the lok to the orresponing resoure group G(r). This ensures that only a single task an aess the resoures in a group at any given time. On the other han, the resoure groups in effet introue large super-resoures, whih an e aesse y at most one task at a time, thus limiting onurreny in the system. In this paper we moel tasks as sequenes of parallel segments whih require onurrent aess to a set of resoures. Neste gloal ritial setions are aresse y a task segment requiring simultaneous aess to all of its require resoures (similar to the approah propose in [12] for the general prolem of ealok avoiane in multitasking), without the nee for loking entire resoure groups. The esription of MSRP in [3], [8] oes not aress multi-unit resoures, whih were supporte y the original SRP esription for a uniproessor [13]. Our PSRP algorithm supports multi-unit non-preemptive resoures. Notie that the sheulaility analysis for PSRP resemles the holisti sheuling analysis presente y [14]. They esrie the en-to-en elay for a pipeline of tasks in a istriute system, where eah task is oun to a proessor an an trigger a task on another proessor y sening a message via a share network. Their tasks orrespon to our segments, an their pipelines of tasks orrespon to our tasks. However, in their moel eah task exeutes on a single proessor an may require only loal non-preemptive resoures. Their moel was extene in [15] to inlue tasks whih an synhronize on an generate multiple events. They allow tasks to exeute onurrently on ifferent noes, ut o not enfore parallel exeution, while in this paper we assume parallel provision of all resoures require y a task segment. B. Multiproessor sheuling of parallel tasks While uner multiproessor sheuling of sequential tasks eah task exeutes on exatly one proessor at a time, uner parallel task sheuling a task nees to exeute on several preemptive resoures (e.g. proessors) or non-preemptive resoures (e.g. graphial proessing units) simultaneously.

A well-known metho for aressing the parallel task sheuling prolem is alle gang sheuling. It was first introue in [1], an later isusse among others in [16], [17]. In its original formulation it was intene for sheuling onurrent jos on large multiproessor systems. The work on parallel task sheuling with real-time onstraints ates ak to [18], where the authors exten Amahl s law [19] to inlue ommuniation elay. They estimate the lower an upper ouns on speeup in a multiproessor system an propose a metho for estimating the response time of parallel tasks whih inorporates the ommuniation elays. They assume a uniform istriution of workloa etween the proessors. In ontrast, in this paper we target systems with aritrary workloa istriution. More reently, the authors in [2] present a homogenous multi-proessor sheuling algorithm for inepenent tasks whih enourages iniviual threas of a multi-threae task to e sheule together. They oserve that when suh threas are ooperative an share a ommon working set, this metho enales more effetive use of on-hip share ahes resulting from fewer ahe misses. They onsier a multi-ore arhiteture with symmetri single-threae ores an a share L2 ahe. They employ the gloal PD 2 an EDF sheulers. Notie that their algorithm only enourages iniviual threas of a multithreae task to e sheule together, unlike gang sheuling, whih guarantees that these threas will e sheule together. Also, the threas elonging to the same task may have ifferent exeution times, ut a ommon perio. The authors of [20] present a sheulaility analysis for preemptive EDF gang sheuling on multiproessors. They assume that a task τ i requires a suset of m i homogenous proessors. The authors of [21] aopt the asi fork-join moel. A task starts exeuting in a single master threa until it enounters a fork onstrut. At that moment it spawns multiple threas whih exeute in parallel. A join onstrut synhronizes the parallel threas. Only the master threa an fork an join. A task an therefore e moele as an alternating sequene of single- an multi-threae sutasks. Both [20] an [21] assume fully preemptive an inepenent tasks. The authors of [22] aress the prolem of sheuling inepenent perioi parallel tasks with impliit ealines on multi-ore proessors. They propose a new task eomposition metho that eomposes eah parallel task into a set of sequential tasks an prove that their task eomposition ahieves a resoure augmentation oun when the eompose tasks are sheule using gloal EDF an partitione ealine monotoni sheuling, respetively. They o not onsier share resoures. III. SYSTEM MODEL In this setion we introue our system moel, omprise of the resoure moel an the appliation moel. A. Resoure moel Definition 1 (Multi-unit resoure). Let R e the set of all resoures in the system. A multi-unit resoure r R onsists of multiple units, where eah unit is a serially aessile entity. A resoure r is speifie y its apaity N r 1, whih represents the maximum numer of units the resoure an provie simultaneously. Memory spae is an example of a multi-unit resoure. In this paper, when talking aout the memory spae resoure we are intereste in the memory requirements in terms of memory size, an ignore the speifis of memory alloation an the atual ata store in the memory. A memory, manage as a olletion of fixe size loks with no external fragmentation, an e regare as a multi-unit resoure with apaity equal to the numer of loks. In this sense our multi-unit resoure is similar to a multi-unit resoure isusse y [13]. The apaity of a multi-unit resoure represents essentially the maximum numer of tasks whih an use the resoure simultaneously. A multi-ore proessor an therefore e moele as a resoure with apaity equal to the numer of ores. A preemption is the hange of ownership of a resoure unit efore the owner is reay to relinquish the ownership. In terms of the traitional task moel, a jo (representing the ownership of a resoure) may e preempte y another jo efore it ompletes. We an lassify all resoures in one of two ategories: Definition 2 (Preemptive resoure). The usage (or ownership) of a unit of a preemptive resoure an e preempte without orrupting the state of the resoure. We use P R to enote the set of all preemptive resoures in the system. Definition 3 (Non-preemptive resoure). The usage (or ownership) of a unit of a non-preemptive resoure may not e preempte without the risk of orrupting the state of the resoure. We use N R to enote the set of all nonpreemptive resoures in the system. Every resoure is either preemptive or non-preemptive, i.e. (N P = R) (N P = ). (1) A proessor is an example of a preemptive resoure, as the proessor state of a running task an e save upon a preemption an later restore. A us or a logial resoure (e.g. a share variale) are examples of a non-preemptive resoure. In the remainer of this paper we assume that all preemptive resoures are single-unit, i.e. B. Appliation moel r P : N r = 1. (2) We onsier a set of n synhronous, perioi, parallel tasks enote y Γ = {τ 1, τ 2,..., τ n }. Eah task τ i is haraterize y a sequene of segments S i, where the j-th segment τ i,j S i is speifie y its worst-ase exeution time E i,j, an a set of resoure requirements R i,j. Eah resoure requirement (r, n) R i,j represents a requirement for n units of resoure r. During runtime, when a segment τ i,j is sheule, all its

require resoures must e provie simultaneously for the entire uration of E i,j. Our task therefore moels programs whih an e expresse as a sequene of segments, where eah segment τ i,j is wrappe etween a lok(r i,j ) an unlok(r i,j ) operation. The semantis of these operations is similar to the primitives use in [12] for loking resoures olletively. A task τ i is further speifie y its fixe an unique priority π i (lower numer iniating higher priority), perio T i, whih speifies the inter-arrival time etween two onseutive instanes of task τ i, an relative ealine D i, with D i T i. We will use S to enote the set of all segments among all the tasks, i.e. S = τ S i Γ i. To keep the notation short, if a segment τ i,j requires only single-unit resoures we will write R i,j = {r 1, r 2, r 3 } instea of R i,j = {(r 1, 1), (r 2, 1), (r 3, 1)}. We will also use the shorthan notation r R i,j instea of writing r (r, n) R i,j. IV. RECAP OF THE MSRP PROTOCOL In this setion we summarize MSRP [3], whih forms the ases for the PSRP algorithm propose in this paper. MSRP is an extension of SRP [13] to multiproessors. The authors in [3] assume partitione multiproessor sheuling, meaning that eah task is statially alloate to a proessor. Depening on this alloation, they istinguish etween loal an gloal resoures: loal resoures are aesse y tasks assigne to the same proessor, while gloal resoures are aesse y tasks assigne to ifferent proessors. The MSRP protool is efine y the following five rules 1 : 1) For loal resoures, the algorithm is the same as the SRP algorithm. In partiular, for every loal resoure r we efine a resoure eiling ϕ(r) greater or equal to the maximum priority among the tasks using the resoure, an for every proessor p we efine a system eiling Π(p) whih at any moment is equal to the maximum resoure eiling among all resoures loke y the tasks on p. A task is allowe to preempt a task alreay exeuting on p only if its priority is higher than Π(p). 2) Tasks are allowe to aess loal resoures through neste ritial setions. It is possile to nest loal an gloal resoures. However, it is not possile to nest gloal ritial setions; otherwise a ealok an our. 3) For eah gloal resoure r, every proessor p efines a resoure eiling ϕ(r) greater than or equal to the maximum priority of the tasks on p. 4) When a task τ i, alloate to proessor p aesses a gloal resoure r, the system eiling Π(p) is raise to ϕ(r) making the task non-preemptale. Then, the task heks if the resoure is free: in this ase, it loks the resoure an exeutes the ritial setion. Otherwise, the task is inserte in r s gloal FIFO queue, an then performs a spin-lok. 5) When a task τ i, alloate to proessor p, releases a gloal resoure r, the algorithm heks the orrespon- 1 In this paper we onsier fixe-priority sheuling, so we ignore the preemption levels in SRP, whih are neee for EDF sheuling. ing FIFO queue, an, in ase some other task τ j is waiting, it grants aess to r, otherwise r is unloke. Then, the system eiling Π(p) is restore to the previous value. Our PSRP algorithm presente in the following setion iffers from MSRP in the following ways: MSRP isallows neste gloal ritial setions, allowing a task to aquire only a single gloal resoure at a time. PSRP supports gloal neste ritial setion y allowing eah segment to aquire several gloal resoures, effetively shifting the inner ritial setions outwar [12]. PSRP supports multi-unit non-preemptive resoures, while MSRP supports only single-unit non-preemptive resoures. PSRP allows a task segment to require several preemptive resoures (e.g. several proessors in parallel), while uner MSRP eah task requires exatly one preemptive resoure. Uner MSRP, segments requiring gloal resoures exeute non-preemptively. In our moel we exten the notion of a gloal resoure, allowing to sheule parallel segments requiring several preemptive resoures nonpreemptively. V. PARALLEL-SRP (PSRP) We present the Parallel-SRP (PSRP) algorithm, whih is inspire y MSRP an an e regare as its generalization to the parallel task moel. A. Loal vs. gloal resoures MSRP istinguishes etween loal an gloal resoures. A resoure is alle loal if it is aesse only y tasks assigne to the same proessor, otherwise it is alle gloal. When task τ i tries to aess a loal resoure whih is alreay aquire y another task τ j (exeuting on the same proessor), τ i must e suspene to allow τ j to ontinue, so that eventually the resoure will e release. When task τ i tries to aess a gloal resoure whih is alreay aquire y another task τ j exeuting on a ifferent proessor, then we have two options for τ i : we an either suspen it an allow another task assigne to the same proessor to o useful work while τ i is waiting, or we an have τ i perform a spin-lok (also alle a usy wait ). In either ase, the loking time has to e taken into aount in the sheulaility analysis. When a task suspens on a gloal resoure, it allows lower priority tasks to exeute an aquire other resoures, potentially leaing to priority inversion. When a task spins on a gloal resoure, it wastes the proessor time whih oul have een use y other tasks. It therefore makes sense to istinguish etween loal an gloal resoures an to treat them ifferently. Similar to MSRP, our PSRP algorithm relies on the notion of loal an gloal resoures. Unfortunately, the efinition of loal an gloal resoures in Setion IV assumes that a task requires exatly one proessor, an hene it is not suffiient for our parallel task moel. We therefore generalize the notion of

a 1 Legen: p 1 1 n 1 1 2 1 n 2 n 3 loal resoure loal segment p 3 2 e 1 n 4 gloal resoure gloal segment Fig. 1. Example illustrating loal an gloal resoures an segments in a segment requirements graph, for a system omprise of P = {p 1,, p 3 }, N = {n 1, n 2, n 3, n 4 }, S = {a 1, 1, 1, 2, 1, 2, e 1, f 1 }, Γ = {a,,,, e, f} with S a = a 1, S = 1, S = 1, 2, S = 1, 2, S e = e 1, S f = f 1, an R a1 = {p 1 }, R 1 = {p 1, n 1 }, R 1 = {p 1, n 1, n 2 }, R 2 = {, p 3, n 2 }, R 1 = {p 3, n 2, n 3 }, R 2 = {n 3 }, R e1 = R f1 = {p 3, n 4 }. loal an gloal resoures. The essential property of a gloal resoure is that it is require y segments whih an attempt to aess their resoures inepenently of eah other (e.g. segments whih are not synhronize one share proessor). Definition 4 (Loal an gloal resoures). We efine a resoure r as loal if (i) it is preemptive an aesse only y segments whih require only non-preemptive resoures esies r, or (ii) it is non-preemptive an aesse only y segments whih also share one an the same preemptive resoure p. Otherwise the resoure is gloal. We use R L an R G to enote the sets of loal an gloal resoures, respetively. Notie that the loal/gloal lassifiation in MSRP is limite only to non-preemptive resoures, while in our efinition it also inlues preemptive resoures. Figure 1 illustrates the ifferene etween loal an gloal resoures. B. Loal vs. gloal segments Similarly to MSRP istinguishing etween loal an gloal ritial setions (guaring aess to loal an gloal resoures, respetively), in PSRP we istinguish etween loal an gloal segments. Definition 5 (Loal an gloal segments). We efine a loal segment as one requiring at least one loal preemptive resoure, otherwise the segment is gloal. We use S L an S G to enote the sets of loal an gloal segments, respetively. Figure 1 illustrates the ifferene etween loal an gloal segments. The intention of PSRP is to sheule segments a 1, 1, 1 preemptively an in priority orer, an 2, 1, 2, e 1, f 1 non-preemptively an in FIFO orer. Resoure holing time is the uration of a ontinuous time interval uring whih a segment owns a resoure, preventing other segments to aess it. Minimizing the holing time is important in the sheulaility analysis, as it as to the loking time. Non-preemptive sheuling of gloal segments will keep the holing times of gloal resoures short. Our hoie for exeuting gloal segments in FIFO orer is in line with MSRP. f 1 C. The PSRP algorithm Uner PSRP, a segment may e reay, exeuting, waiting (on a gloal resoure), or loke (on a loal resoure). A segment may e oth waiting an loke at the same time. A task jo inherits the state from its urrently ative segment. The PSRP algorithm follows the following set of rules: 1) For loal resoures the algorithm is the same as SRP. In partiular, for every loal non-preemptive resoure r R L N, we efine a resoure eiling ϕ(r) to e equal to the highest priority of any task requiring r. For every loal preemptive resoure p R L P we efine a system eiling Π(p) whih at any moment is equal to the maximum resoure eiling among all loal nonpreemptive resoures loke y any segment that also loks p. We also equip p with a reay queue queue(p), whih stores tasks waiting for or exeuting on p sorte y priority. The task at the hea of queue(p) is allowe to preempt a task alreay exeuting on p only if its priority is higher than Π(p). 2) Eah gloal resoure r R G is equippe with a FIFO resoure queue queue(r) 2. The resoure queue stores tasks whih are waiting for or exeuting on the resoure. 3) When a task τ i attempts to lok a set of resoures R using lok(r), it is inserte into the resoure queues of all gloal resoures in R. Moreover, this insertion is atomi, meaning that no other task an e inserte into any of the resoure queues in R efore τ i has een inserte into all queues in R. When a task τ i releases a set of resoures R using unlok(r), it is remove from the resoure queues of all gloal resoures in R. Eah unlok(r) must e preee y a lok(r) all, with the same R. 4) A task τ i is sai to e reay at time t if for all resoures r require y its urrently ative segment τ i,j : r has enough units availale, an if r R G (R L P) then τ i is at the hea of queue(r), an if r R L P then π i < Π(r). 5) If after aing a task τ i to the queue of resoure r the hea of the queue has hange (i.e. if τ i ens up at the hea), or if after removing a task from a resoure queue the queue is not empty, then the sheuler heks if the hea task is reay. If so, then the task is sheule an eomes exeuting. Otherwise, τ i performs a spin-lok (at the highest priority) on eah resoure ontaining τ i at the hea of its queue an eomes waiting 3. Notie that if a task τ i requires several units of a resoure r, then it will spin-lok until enough of the segments urrently using r have omplete an release a suffiient numer of units of r for τ i to ontinue. 6) The following invariant is maintaine: the system eiling of eah loal preemptive resoure is equal to the top 2 MSRP also equips eah gloal resoure with a FIFO queue. 3 On some resoures spinning may not make sense (e.g. spinning on a us). Spinning essentially reserves a resoure, preventing other tasks from exeuting on it, an an e implemente ifferently on ifferent resoures.

priority whenever a segment requiring a gloal resoure is spinning or exeuting on it. VI. SCHEDULABILITY ANALYSIS FOR PSRP We first show that PSRP oes not suffer from ealok nor from live lok, an then we proee to oun the worst-ase response time (WCRT) of tasks. Lemma 1. PSRP oes not suffer from ealok. Proof: Sine every lok(r) is aompanie y a orresponing unlok(r), with the same R, eah loke resoure will eventually e unloke, provie no segment is loke inefinitely. Aess to loal resoures is synhronize using SRP, whih was prove to e ealok free y [13]. We therefore nee to show the asene of epeneny yles when aessing gloal resoures, in partiular when (i) segments are waiting for resoures an (ii) after they have starte exeuting. (i) Let us assume a segment τ i,j is exeuting lok(r i,j ) an it nees to wait for some of the resoures in R i,j. Sine the resoure queues hanle the tasks in FIFO orer an sine the aition to all queues in R i,j is atomi, a epeneny yle when segment τ i,j is waiting for resoures is not possile. (ii) Sine we have assume that ritial setions o not span aross task segments an that all resoures require y a segment are provie simultaneously (i.e. either all or none), one a segment starts exeuting it will e ale to omplete an release the aquire resoures. Hene a ealok annot our. Lemma 2. PSRP oes not suffer from live lok. Proof: Similarly to Lemma 1, we nee to show the asene of livelok when segments are aessing gloal resoures. In partiular, we nee to show that every segment τ i,j requiring gloal resoures will eventually start exeuting. Aoring to Lemma 1, a segment will not ealok uring its waiting phase, so, as long as other segments waiting in a resoure queue in front of it eventually omplete, it will eventually start exeuting. Sine eah segment nees to exeute for a finite amount of time, an sine, aoring to Lemma 1, it will not ealok uring the exeution phase either, every segment inserte into a resoure queue will have to wait for at most a finite amount of time. Hene a livelok annot our. To show that tasks will meet their real-time onstraints, we erive the oun on their WCRT. To see if a task is sheulale we hek if the WCRT of its last segment, measure from the arrival time of the task, is within the task s ealine. Lemma 3. A loal segment requires exatly one preemptive loal resoure. Proof: Aoring to Definition 4, segments whih share a loal resoure share exatly one preemptive resoure. Aoring to Definition 5, every loal segment requires at least one loal preemptive resoure. Lemma 3 follows. Lemma 4. A gloal segment requires at least one gloal resoure. Proof: Let s e a gloal segment. Then, aoring to Definition 5, all resoures require y s are either loal nonpreemptive or gloal. We an safely assume that eah segment requires at least one resoure. We now show y ontraition that no segment an require only loal non-preemptive resoures. So, let us assume that there exists a segment x whih oes require only loal non-preemptive resoures. Aoring to Definition 4, all segments whih require any of the segments in R x will also require exatly one preemptive resoure. But this also hols for segment x itself, whih ontraits the assumption that all resoures require y x are loal non-preemptive. Sine no segment an require only loal non-preemptive resoures, an sine eah resoure require y a gloal segment is either loal non-preemptive or gloal, eah gloal segment must require at least one gloal resoure. Aoring to Lemma 3, a loal segment requires exatly one preemptive resoure. This preemptive resoure will itate the ehavior of the loal segments sharing it. PSRP will use the priority-orere reay queue to sheule loal segments ase on their priority. Aoring to Lemma 4, a gloal segment requires at least one gloal resoure, an aoring to Definition 5 it oes not require any loal preemptive resoures. PSRP will use the resoure queues attahe to the gloal resoures to sheule the gloal segments non-preemptively in FIFO orer. In the remainer of this setion we erive an equation for the WCRT for gloal an loal segments. A. Response time of gloal segments A gloal segment spin-loks an exeutes on all its require resoures at the highest priority. Consequently, as a gloal segment annot e preempte, its response time is omprise of three time intervals, as illustrate in Figure 2. elay ue to previous segments require y the same task waiting for require resoures segment exeution time Legen: task arrival previous segments urrent segment Fig. 2. Response time of a gloal segment. The elay ue to segments preeing τ i,j in S i is equal to the response time of the previous segment, whih an e ompute y iterating through the sequene starting with the first segment. The exeution time of segment τ i,j is simply E i,j. The interesting part is the time that segment τ i,j spens waiting for resoures in R i,j. The MSRP algorithm assumes that at any time eah gloal segment τ i,j requires one preemptive an at most one nonpreemptive resoure. Also, aess to a gloal resoure is grante to segments in FIFO orer. Consequently, they oserve that the worst-ase spinning time of segment τ i,j on a preemptive resoure is equal to the sum of the segment exeution times of all segments sharing the non-preemptive

resoure with τ i,j. In our moel, a segment an require an aritrary numer of preemptive an non-preemptive resoures, whih may result in a longer spinning time. Definition 6. The requirements of all segments an e represente y a segment requirements graph G = (V, E) where the set of verties V = R S, an the set of eges E 2 S R represents the resoure requirements of segments, i.e. (τ i,j, r) E (τ i,j S r R r R i,j ). (3) The graph is tripartite, as we an ivie E into two isjoint sets E P an E N, suh that (τ i,j, r) E P : (τ i,j S r P), (4) (τ i,j, r) E N : (τ i,j S r N). (5) Example 1 Consier a platform omprise of four proessors P = {p 1,, p 3, p 4 }, exeuting an appliation onsisting of four tasks, eah ontaining one segment. We name these segments S = {a,,, }, an efine their resoure requirements as follows: R a = {p 1 }, R = {p 1, }, R = {, p 3 }, an R = {p 3, p 4 }, as shown in Figure 3. Notie that all resoures an segments are gloal. a p 1 p 3 Fig. 3. A segment requirements graph for a system omprise of P = {p 1,, p 3, p 4 }, N =, S = {a,,, } with R a = {p 1 }, R = {p 1, }, R = {, p 3 }, an R = {p 3, p 4 }. Let us assume a senario, where the proessors are ile an segments a,,, arrive soon after eah other, as shown in Figure 4. When segment a arrives an proessor p 1 is iling, it is immeiately sheule an starts exeuting. When segment arrives, requiring proessors p 1 an, an enounters a usy proessor p 1, it is ae to the resoure queues of queue(p 1 ) an queue( ). Sine it is at the hea of queue( ) it starts spinning on (at the highest priority). Soon after segment arrives an similarly is inserte into the resoure queues of queue( ) an queue(p 3 ) an starts spinning on p 3. When segment arrives soon after segment, it is inserte into queue(p 3 ) an queue(p 4 ) an starts spinning on p 4. When segment a ompletes an releases p 1, it is remove from queue(p 1 ), enaling segment, whih starts exeuting. This proess ontinues, susequently releasing segments an. Notie that segment annot start exeuting efore has omplete, whih annot start efore has omplete, whih annot start efore a has omplete. A segment may e require to wait insie of a resoure queue, either passively waiting in the queue s tail or atively spinning at its hea. Example 1 suggests that, uner PSRP, a segment may nee to wait on its require resoures until all segments whih it epens on in the segment requirements p 4 queue(p 4 ) p 4 queue(p 3 ) Legen: p 3 queue( ) queue(p 1 ) p 1 a a a a segment arrival segment exeution segment spinning resoure queue time Fig. 4. Example of parallel-haine loking of gloal segments. The figure shows the arrival an exeution of segments S = {a,,, } on preemptive resoures P = {p 1,, p 3, p 4 } an the ontents of their resoure queues. Sine we assume that eah task ontains only one segment, for ease of presentation we refer to the tasks insie the resoure queues y the orresponing segment names. graph have omplete. We an oserve, however, that segments elonging to the same task are exeute sequentially (y efinition), an therefore annot interfere with eah other. The time that segment τ i,j may nee to wait is therefore limite to those segments, whih τ i,j epens on if we ignore segments elonging to the same task. Moreover, only segments whih require at least one gloal resoure may wait insie of a resoure queue. Segments whih require only loal resoures will never e inserte into a resoure queue, eause resoure queues are assoiate exlusively with gloal resoures. We now efine the notion of a partial segment requirements graph, whih inlues only those epenenies in a segment requirements graph whih are inee feasile. We use these graphs later to formalize the notion of epeneny. Definition 7. A partial segment requirements graph G = (V, E ) erive from segment requirements graph G = (V, E) is a sugraph of G, with V V an E E, suh that 1) V ontains gloal resoures, ut no loal resoures, i.e. R G V R L V =, 2) if τ i requires at least one gloal resoure, then there is exatly one segment from S i in V, i.e. τ i Γ : (( τ i,j S i : R i,j R G ) {τ i,j τ i,j V } = 1), 3) segments requiring only loal resoures are ignore τ i,j S : R i,j R L τ i,j / V, 4) E ontains all the eges (an only those eges) from E whih have oth enpoints in V, i.e. {a, } E : (a V V ) {a, } E.

Conition 1 in Definition 7 makes sure that segments whih require only loal resoures will e unreahale from gloal segments. Conition 3 removes those segments from a partial segment requirements graph to keep it onise. Definition 8. We efine partial(g) as the set of all possile partial segment requirements graphs whih an e erive from the segment requirements graph G. Figure 5 illustrates the partial graphs erive from the segment requirements graph in Figure 1. p 3 1 1 n 2 n 3 G 1 e 1 p 3 2 1 n 2 n 3 G 3 e 1 f 1 f 1 p 3 1 n 2 n 3 2 e 1 f 1 G 2 p 3 2 n 2 n 3 2 e 1 f 1 Fig. 5. Partial segment requirements graphs erive from the segment requirements graph in Figure 1, assuming tasks Γ = {a,,, }, with S a = a 1, S i,j = 1, S = 1, 2, S = 1, 2. Definition 9. Let G = (V, E) e a segment requirements graph. We efine δ(τ i,j, g) to e the set of segments whih τ i,j an reah in the partial segment requirements graph g partial(g). We say that τ i,j an reah τ x,y in g iff oth segments elong to the same onnete sugraph of g, an τ i,j τ x,y. We say that τ i,j epens on τ x,y iff g partial(g) : τ x,y δ(τ i,j, g). (6) Notie that the epeneny relation is symmetri, i.e. an transitive, i.e. τ x,y δ(τ i,j, g) τ i,j δ(τ x,y, g), (7) τ x,y δ(τ i,j, g) τ i,j δ(τ a,, g) τ x,y δ(τ a,, g). (8) Example 2 Figure 6 shows an example of the epenenies in the partial segment requirements graphs in Figure 5. τ i,j δ(τ i,j, G 1 ) δ(τ i,j, G 2 ) δ(τ i,j, G 3 ) δ(τ i,j, G 4 ) 1 { 1, e 1, f 1 } 2 { 1, e 1, f 1 } {e 1, f 1 } 1 { 1, e 1, f 1 } { 2, e 1, f 1 } 2 e 1 { 1, 1, f 1 } {f 1 } { 2, 1, f 1 } { 2, f 1 } f 1 { 1, 1, e 1 } {e 1 } { 2, 1, e 1 } { 2, e 1 } Fig. 6. Depenenies for segments in Figure 5, where G 1, G 2, G 3, G 4 represents the partial segment requirements graphs in Figure 5. G 4 Lemma 5. Uner PSRP, eah segment τ i,j S will have to wait on gloal resoures efore it an start exeuting for at most wait(τ i,j ) = max 0, max, (9) E x,y g partial(g) τ x,y δ(τ i,j,g) where G is the segment requirements graph. Proof: Consier the situation when a segment τ i,j tries to start exeuting an aquire resoures in R i,j. If any of the resoures is not availale, τ i,j will have to wait. Let wait resoure(τ i,j, r) e the worst-ase time that segment τ i,j may spen waiting ue to resoure r. When τ i,j tries to aess a gloal resoure r whih is not availale, then τ i will e inserte at the en of queue(r). Sine queue(r) is a FIFO queue, a segment τ x,y resiing insie of queue(r) in front of τ i,j will have to omplete first, efore τ x an e ae at the en of queue(r) again. Hene a task may e represente only one insie of a resoure queue, an therefore the length of the resoure queue is at most equal to the numer of tasks requiring r. In other wors, a task τ x sharing resoure r with segment τ i,j will interfere with τ i,j (uring the time τ i,j is waiting on r) for the uration of at most one of its segments in S τx. Let B(τ i,j, r) e the worst-ase set of segments whih are waiting in queue(r) in front of τ i,j. Eah segment τ x,y B(τ i,j, r) an itself e waiting on other resoures: for eah resoure s R x,y, segment τ x,y may nee to wait for all segments in B(τ x,y, s). For eah of those segments in B(τ x,y, s) we an apply the same reasoning. In effet, segment τ i,j may nee to wait for many segments whih it iniretly epens on. A straightforwar approah woul e to esignate all segments whih are reahale from τ i,j in G as the set that segment τ i,j epens on. We now show how to oun this set y removing the unneessary verties from G. (i) The fat that τ x,y is insie of a resoure queue implies that its priority is higher or equal to the system eiling of any preemptive resoure it may require, meaning that it annot e waiting any more for loal resoures. We therefore nee to onsier only gloal resoures. (ii) At any moment in time only one segment of a task an e ative. Therefore, segment τ i,j will not epen on segments elonging to the same task, i.e segments in S i \ {τ i,j }. (iii) Moreover, segment τ i,j will not epen on any segment whih a segment τ i,k from the same task epens on, unless τ i,j also epens on it after removing τ i,k from G. The same hols for any other segment in S. Aoring to (i), (ii) an (iii) we nee to onsier only segments whih are reahale from τ i,j in G, after we remove the verties orresponing to the loal resoures, an segments elonging to the same task from G. In other wors, segment τ i,j epens only on segments τ x,y, suh that (aoring to Definition 8 an 9) τ x,y δ(r, g), where g partial(g). Moreover, sine (aoring to Lemma 1) there are no epeneny yles, we nee to onsier only a single jo of eah τ x,y.

Segment τ i,j will have to wait for wait resoure(τ i,j, r) time on all resoures r R i,j. Sine a segment is inserte into the resoure queues of all resoures r R i,j simultaneously, an any spin-loks are performe onurrently, its total waiting time is given y (9). Example 3 Figure 7 shows an example of the waiting times for segments in the partial segment requirements graphs in Figure 5 for example values of E i,j. τ i,j E i,j wait(τ i,j ) 1 2 3 2 2 3 1 2 3 2 16 0 e 1 0.5 4.5 f 1 0.5 4.5 Fig. 7. Waiting times for segments in Figure 5. Note that (9) is pessimisti. Figure 8 illustrates the soure of the pessimism for wait(). Aoring to PSRP, segment may e elaye y a or, ut not oth. Lemma 5, however, assumes that in the worst-ase will have to wait for oth a an, whih is pessimisti in ase a an o not share a ommon resoure. a p 1 Fig. 8. A segment requirements graph for a system omprise of P = {p 1, }, N =, S = {a,, } with R a = {p 1 }, R = {p 1, }, an R = { }. Corollary 1. A segment τ i,j whih requires only loal resoures will never have to wait insie of a resoure queue, i.e. τ i,j S : R i,j R L wait(τ i,j ) = 0. Definition 10. For segment τ i,j we use E (τ i,j ) = wait(τ i,j )+ E i,j to enote the exeution time of τ i,j extene with its waiting time. Definition 11. We efine A(τ i,j ) to e the worst-ase ativation time of segment τ i,j relative to the arrival time of its parent task τ i. A(τ i,j ) is equal to the WCRT of the previous segment in S i, or 0 in ase τ i,j is the first segment in S i, i.e. { WCRT (τ i,j 1 ) if j > 1, A(τ i,j ) = (10) 0 otherwise. Theorem 1. Uner PSRP, the WCRT of a gloal segment τ i,j S G, measure sine the arrival of the parent task, is oune y WCRT (τ i,j ) = A(τ i,j ) + E (τ i,j ). (11) Proof: Sine eah segment τ i,j elonging to task τ i is ispathe only after the previous segment τ i,j 1 has omplete (or when τ i has arrive, in ase of the first segment), an sine we assume D i T i for all tasks τ i, segments elonging to the same task o not interfere with eah other. Sine eah segment is ispathe immeiately after the previous one has omplete (or at the moment τ i has arrive, in ase of the first segment), there is no ile time etween the segments. Therefore, segment τ i,j will attempt to lok its require resoures at time A(τ i,j ). At this moment it will start waiting on all the resoures whih it requires ut whih are unavailale. It will wait for at most wait(τ i,j ) time units. Sine segments spin at the highest priority, immeiately after it stops spinning it will start exeuting. Also, sine we assume that all neste ritial setions have een shifte outwars an sine the system eiling of all resoures in R i,j is raise to the top priority at the moment τ i,j starts exeuting, segment τ i,j annot e preempte nor loke one it starts exeuting. In the worst-ase it will therefore exeute for E i,j time efore ompleting. In orer to ompute the WCRT of a gloal segment τ i,j, we therefore simply have to sum up its release jitter, total waiting time an exeution time. B. Response time of loal segments In this setion we erive the WCRT of a loal segment. Lemma 6. Uner PSRP, the maximum loking that a loal segment τ i,j an experiene is given y { max{b L (τ i,j ), B G (τ i,j )} if r Ri,j : r R L, B(τ i,j ) = B L (τ i,j ) otherwise. (12) where B L (τ i,j ) = max{e x,y R x,y R G = π x > π i ( r R x,y R i,j : ϕ(r) π i )} (13) B G (τ i,j ) = max{e (τ x,y ) R x,y R G π x > π i R x,y R i,j }. (14) Proof: A loal segment τ i,j an e loke y loal an gloal segments. Let B L (τ i,j ) an B G (τ i,j ) e the loking time experiene y τ i,j ue to loal an gloal resoures, respetively. Gloal segments use only gloal resoures (aoring to Definition 5). Loal segments therefore only ompete with loal segments on loal resoures. Aess to loal resoures is manage using SRP. Aoring to SRP, segment τ i,j may e loke y a lower priority segment only one, efore τ i,j starts exeuting. Moreover, this loking time is equal to the length of the longest segment among those whih have a lower priority than τ i,j an share resoures with τ i,j. Equation (13) follows. A loal segment τ i,j whih requires only loal resoures may also e loke y a lower priority loal segment τ x,y whih requires also gloal resoures, when it spin-loks or

exeutes on those gloal resoures. Aoring to Lemma 3, every loal segment uses exatly one loal preemptive resoure. The PSRP algorithm allows a segment to exeute the lok() operation only if its priority is higher than the system eiling of the loal preemptive resoure share with τ i,j. Sine the reay segments are sheule on the preemptive resoure aoring to their priority, τ i,j an e loke y only one segment τ x,y an at most one. Moreover, τ x,y must have starte exeuting efore τ i,j has arrive, otherwise τ i,j woul have een sheule instea. Aoring to Lemma 1 the resoure holing time of segment τ x,y on eah of its require resoures is oune y E (τ k ). Equation (14) follows. Sine (aoring to Definitions 4 an 5) exatly one preemptive resoure will e share etween all loal segments sharing resoures with τ i,j, this preemptive resoure will synhronize the aess to all other (non-preemptive) resoures require y τ i,j. Segment τ i,j whih requires only loal resoures an therefore lok on either a loal segment or a gloal segment, ut not oth. The first onition in (12) follows. A loal segment τ i,j, whih requires at least one gloal resoure, will start spinning at the highest priority as soon as it reahes the highest priority on the preemptive resoure. Sine the spinning time is alreay taken into aount in E (τ i,j ), we only nee to onsier loking on loal segments, an an ignore loking on gloal segments. The seon onition in (12) follows. Example 4 Applying Lemma 6 to our leaing example in Figure 1 (with segment priorities ereasing alphaetially) will result in the following loking times for loal segments: B(a 1 ) = max{e ( 1 ), E ( 1 )}, B( 1 ) = E ( 1 ), B( 1 ) = 0. Notie that Lemma 6 ignores the fat that 1 may lok on 1, sine it is taken into aount in the E ( 1 ) term in Theorem 2. Theorem 2. Uner PSRP, the WCRT of a loal segment τ i,j S L, measure sine the arrival of the parent task, is oune y WCRT (τ i,j ) = A(τ i,j ) + w(τ i,j ), (15) where w(τ i,j ) is the smallest value whih satisfies w(τ i,j ) =B(τ i,j ) + E (τ i,j )+ w(τi,j ) + J(τ x,y ) E (τ x,y ), (16) τ x,y X where J(τ x,y ) = A(τ x,y ) z<y E x,z is the ativation jitter of segment τ x,y, an X = {τ x,y π x < π i T x (R x,y R i,j R L )} is the set of higher priority segments whih share a loal resoure with τ i,j. Proof: As soon as a loal segment is release, it will try to lok all its require resoures in R i,j. If any of the resoures it requires are not availale, it will lok for B(τ i,j ) given y (12). When τ i,j is reay to resume after the initial loking, we istinguish etween two ases, epening on whether (i) τ i,j requires only loal resoures, or (ii) τ i,j requires at least one gloal resoure. In ase (i), aoring to Corollary 1, segment τ i,j will not wait insie of a resoure queue, i.e. wait(τ i,j ) = 0. During the time that the segment is loke or exeuting, higher priority segments sharing loal resoures with τ i,j an arrive an interfere with it. These segments must e loal too, otherwise, aoring to Definition 5, segment τ i,j woul have een gloal. The inter-arrival time etween two onseutive invoations of a higher priority segment τ x,y is equal to its tasks perio, with the first arrival suffering an ativation jitter J(τ x,y ), whih an e oune y the ativation time of τ x,y minus the exeution time of all the segments preeeing it in S x. Equation (16) follows. In ase (ii), uring the time τ i,j is loke, higher priority segments may arrive. However, sine τ i,j requires a gloal resoure, as soon as it eomes reay to exeute it will e inserte into the resoure queue of all resoures in R i,j an start spinning at the highest priority on the single loal preemptive resoure whih it requires (aoring to Lemma 3). The spinning time is inlue in the E (τ i,j ) term in (16). As soon as all the resoures in R i,j are availale, it will ontinue exeuting at the highest priority on the preemptive resoure. Therefore, higher priority segments arriving uring the time τ i,j is waiting or exeuting (i.e. uring the E (τ i,j ) term) will not interfere with τ i,j. Sine in this theorem we are proviing an upper oun, equation (16) follows. A loal segment τ i,j will e elaye (relative to the arrival of its parent task) y the WCRT of the previous segment (if any), represente y the A(τ i,j ) term in (15). C. Response time of tasks Now that we know how to ompute the WCRT of loal an gloal segments, we an easily etermine the WCRT of tasks. Corollary 2. Uner PSRP, the WCRT of a task τ i Γ is given y the WCRT of the last segment in S i. Note that the WCRT of a segment epens on the ativation time of another segment. In turn, the ativation time of a segment epens on the WCRT of another segment. However, sine the priority of all segments of a given task is the same, this mutual epeneny prolem an e solve y simply omputing response an ativation times in orer from the highest priority task to the lowest priority task. VII. EVALUATION In this setion we emonstrate the effetiveness of PSRP in exploiting the inherent parallelism of a platform omprise of multiple heterogeneous resoures. We onsier a task set where some task segments require several proessors at the same time an also share non-preemptive resoures. We sheule it using two approahes: (i) using PSRP, an (ii) y ollapsing all the proessors into one virtual proessor, i.e. y treating the entire platform as a single resoure, an applying uniproessor sheuling (whih we refer to as Collapse ).

To the est of our knowlege, the seon approah is urrently the est alternative to PSRP for sheuling parallel tasks whih an exeute on aritrary susets of proessors an share nonpreemptive resoures. We ompute the WCRT of the omplete task for the two approahes. The ifferene in response times represents potential utilization gain, whih an e exploite y e.g. akgroun tasks or tighter timing requirements. The simulate task set Γ represents a multimeia appliation, where vieo frames are apture perioially with perio T an susequently proesse y a set of filters. Some of the filters are omputationally intensive, ut an exploit funtional parallelism an exeute on several proessors in parallel. The vieo frames are store in a share gloal memory. Eah parallel filter loas the neessary frame ata from the gloal memory into its loal uffer, operates on it, an writes the result ak to the gloal memory. The ata is transferre using a DMA ontroller. The simulate platform orrespons to a PC with a multiore proessor. For simpliity we assume no ahes. PSRP approah: The platform onsists of M proessors p j, 1 j M, a gloal memory m, loal memories aessile y iniviual proessors (or groups of proessors) where m i represents the memory region in a loal memory alloate to task τ i, an a DMA ontroller ma for transferring ata etween the gloal an loal memories. It an e expresse in terms of our moel as P = {p 1,,..., p M } an N = {ma, m, m 1, m 2,..., m Γ }. We onsier several senarios. In eah senario we ivie the proessors into H groups P g, 1 g H. Eah group ontains W proessors, with H W = M. On eah group of proessors we exeute a set of K parallel tasks. Eah parallel task τ i elonging to group g is speifie y S(τ i ) = (0.5, {ma, m, m i }), (5, P g {m i }), (0.5, {ma, m, m i }). On eah proessor p j P we also exeute a sequential task τ i with S(τ i ) = (2, {p j, m i }). All tasks share the same perio T, τ i Γ : D i = T i, an the parallel tasks have higher priority than the sequential tasks. Collapse approah: We an moel the Collapse approah y replaing all proessors y one preemptive resoure p an having eah segment require at least the resoure p. For a senario with H, W an K efine aove, the ollapse task set then onsists of H K tasks τ i with S(τ i ) = (0.5, {p, ma, m, m i }), (5, {p, m i }), (0.5, {p, ma, m, m i }), an H W tasks τ i with S(τ i ) = (2, {p, m i }). Figure 9 ompares the maximum WCRT among all tasks in Γ etween the PSRP an Collapse approahes for H = 2. We vary the numer of tasks per proessor group K an the numer of proessors require y parallel tasks W. We have ompute the WCRT for the PSRP approah using the analysis presente in this paper, an for the Collapse approah using the Fixe Priority Preemptive Sheuling analysis [23]. The results show that PSRP experienes lower WCRT than the Collapse approah. Moreover, sine the ifferene in WCRT inreases for larger values of K an W, the enefits of PSRP inrease with larger task sets an more parallelism, i.e. when tasks exeute on more proessors in parallel. The Max worst-ase response time 80 70 60 50 40 30 20 Collapse, W=4 Collapse, W=2 PSRP, W=4 PSRP, W=2 10 1 2 3 4 5 Numer of tasks in eah proessor group (K) Fig. 9. Comparison of WCRTs for ases (i) an (ii) for H = 2 an varying W an K. results therefore emonstrate that PSRP inee outperforms the Collapse approah. VIII. DISCUSSION In this setion we isuss the pros an ons of the propose system moel an PSRP. A. Multi-unit preemptive resoures We an moel a homogenous multiproessor ontaining n ores in two ways: as a preemptive resoure p with N p = n, or as n preemptive resoures p 1,,..., p n, with apaities N pi = 1, for all 1 i n. Existing literature on parallel task sheuling on multiproessors assumes the first option, where eah task segment speifies a requirement for a numer of units of a multi-unit resoure p. The system is then responsile for alloating tasks to proessors uring runtime. This moel will ignore potentially large migration overheas, e.g. in memory intensive appliations as ata loality annot e guarantee. Using the seon approah, our moel allows to partition the task set upfront, e.g. optimizing ata loality. B. Non-preemptive exeution on preemptive resoures When a preemptive resoure p is require y a segment whih requires also another preemptive resoure, then p is marke as a gloal resoure, resulting in non-preemptive exeution on p. This may appear overly pessimisti, espeially ompare to the work in [20] whih esries a preemptive gang sheuling algorithm. However, they assume inepenent tasks. In multiproessor sheuling with share resoures it is ritial to keep the holing time of gloal non-preemptive resoures as short as possile (whih is the rationale etween the spin-lok ase approah to loking gloal resoures in MSRP). If we were to sheule all preemptive resoures preemptively, then segments requiring several preemptive resoures in a haine fashion (illustrate in Figure 3) woul inrease the resoure holing time. We therefore eie to limit preemptive exeution to preemptive resoures whih are require y segments whih o not require any other preemptive resoure.

C. Pessimisti analysis for loal segments Theorem 2 esries the WCRT of a loal segment. It treats all loal segments alike, whether they require gloal resoures or not. However, only a loal segment whih requires only loal resoures an e preempte y higher priority segments while it is exeuting. A segment whih requires at least one gloal resoure will e sheule non-preemptively on all preemptive resoures it requires. We an therefore lower the oun on WCRT of loal segments whih require gloal resoures y ignoring the interferene of higher priority tasks uring the exeution of those segments. For this purpose we an aopt the sheulaility analysis for Fixe-Priority with Deferre Preemption Sheuling y [24]. D. Neste gloal ritial setions FMLP supports neste ritial setions y means of resoure groups. The resoure groups partition the set of resoures into inepenent susets. Consequently, a task trying to aess resoure r may eome loke on all resoures in the resoure group G(r). Uner PSRP, if a task has neste ritial setions we an move the inner ritial setions outwars until they overlap exatly with the outer most ritial setion. This new task an e expresse in our system moel, where segments require several resoures at the same time. Eah segment an e loke only on resoures whih it requires (rather than the omplete resoure group). PSRP therefore provies a more flexile approah for ealing with neste gloal ritial setions than FMLP. In the worst ase, uner FMLP a segment requiring resoure r will e loke y every task for the uration of the longest segment whih requires a resoure from G(r), while uner PSRP a segment will e iniretly loke y all epenent segments (see Figure 3). Uner FMLP every time a jo is resume it may lok on a loal resoure. This is the same for PSRP, where we have to inlue the loking time for eah segment (rather than one per task). IX. CONCLUSION In this paper we aresse the prolem of multi-resoure sheuling of parallel tasks with real-time onstraints. We propose a new resoure moel, whih lassifies ifferent resoures (suh as us, proessor, share variale, et.) as either a preemptive or non-preemptive multi-unit resoure. We then presente a new sheuling algorithm alle PSRP an the orresponing sheulaility analysis. Simulation results ase on an example appliation show that it an exploit the inherent parallelism of a platform omprise of multiple heterogeneous resoures. Currently, PSRP requires that the preemptive resoures have only a single unit. In the future we want to exten PSRP to hanle multi-unit preemptive resoures. [2] J. H. Anerson an J. M. Calanrino, Parallel real-time task sheuling on multiore platforms, in Real-Time Systems Symposium (RTSS), Deemer 2006, pp. 89 100. [3] P. Gai, G. Lipari, an M. D. Natale, Minimizing memory utilization of real-time task sets in single an multi-proessor systems-on-a-hip, in Real-Time Systems Symposium (RTSS), 2001, pp. 73 83. [4] E. W. Dijkstra, The mathematis ehin the Banker s Algorithm, in Selete Writings on Computing: A Personal Perspetive. Springer- Verlag, 1982, pp. 308 312. [5] A. N. Haermann, Prevention of system ealoks, Commun. ACM, vol. 12, pp. 373 382, July 1969. [6] R. Rajkumar, L. Sha, an J. Lehozky, Real-time synhronization protools for multiproessors, in Real-Time Systems Symposium (RTSS), 1988, pp. 259 269. [7] A. Blok, H. Leontyev, B. B. Branenurg, an J. H. Anerson, A flexile real-time loking protool for multiproessors, in International Conferene on Emee an Real-Time Computing Systems an Appliations (RTCSA), 2007, pp. 47 56. [8] P. Gai, M. Di Natale, G. Lipari, A. Ferrari, C. Gaellini, an P. Marea, A omparison of MPCP an MSRP when sharing resoures in the Janus multiple-proessor on a hip platform, in Real-Time an Emee Tehnology an Appliations Symposium (RTAS), 2003, pp. 189 198. [9] B. B. Branenurg, J. M. Calanrino, A. Blok, H. Leontyev, an J. H. Anerson, Real-time synhronization on multiproessors: To lok or not to lok, to suspen or spin? in Real-Time an Emee Tehnology an Appliations Symposium (RTAS), 2008, pp. 342 353. [10] B. B. Branenurg an J. H. Anerson, A omparison of the M-PCP, D-PCP, an FMLP on LITMUS RT, in International Conferene on Priniples of Distriute Systems (OPODIS), 2008, pp. 105 124. [11] K. Lakshmanan, D. e Niz, an R. Rajkumar, Coorinate task sheuling, alloation an synhronization on multiproessors, in Real- Time Systems Symposium (RTSS), 2009, pp. 469 478. [12] J. W. Havener, Avoiing ealok in multitasking systems, IBM Systems Journal, vol. 7, no. 2, pp. 74 84, 1968. [13] T. P. Baker, Stak-ase sheuling for realtime proesses, Real-Time Systems, vol. 3, no. 1, pp. 67 99, 1991. [14] K. Tinell an J. Clark, Holisti sheulaility analysis for istriute har real-time systems, Miroproess. Miroprogram., vol. 40, no. 2-3, pp. 117 134, 1994. [15] J. J. G. Garía, J. C. P. Gutiérrez, an M. G. Harour, Sheulaility analysis of istriute har real-time systems with multiple-event synhronization, in Euromiro Conferene on Real-Time Systems (ECRTS), 2000, pp. 15 24. [16] D. G. Feitelson, Distriute hierarhial ontrol for parallel proessing, Computer, vol. 23, no. 5, pp. 65 77, 1990. [17] D. G. Feitelson an L. Ruolph, Gang sheuling performane enefits for fine-grain synhronization, Journal of Parallel an Distriute Computing, vol. 16, no. 4, pp. 306 318, 1992. [18] X. Li an M. Malek, Analysis of speeup an ommuniation/omputation ratio in multiproessor systems, in Real-Time Systems Symposium (RTSS), e 1988, pp. 282 288. [19] G. M. Amahl, Valiity of the single proessor approah to ahieving large sale omputing apailities, in Spring joint omputer onferene, 1967, pp. 483 485. [20] S. Kato an Y. Ishikawa, Gang EDF sheuling of parallel task systems, in Real-Time Systems Symposium (RTSS), 2009, pp. 459 468. [21] K. Lakshmanan, S. Kato, an R. Rajkumar, Sheuling parallel realtime tasks on multi-ore proessors, in Real-Time Systems Symposium (RTSS), 2010, pp. 259 268. [22] A. Saifullah, K. Agrawal, C. Lu, an C. Gill, Multi-ore real-time sheuling for generalize parallel task moels, in Real-Time Systems Symposium (RTSS), 2011, pp. 217 226. [23] N. Ausley, A. Burns, M. Riharson, K. Tinell, an A. J. Wellings, Applying new sheuling theory to stati priority pre-emptive sheuling, Software Engineering Journal, vol. 8, no. 5, pp. 284 292, 1993. [24] R. J. Bril, J. J. Lukkien, an W. F. J. Verhaegh, Worst-ase response time analysis of real-time tasks uner fixe-priority sheuling with eferre preemption revisite, in Euromiro Conferene on Real-Time Systems (ECRTS), 2007, pp. 269 279. REFERENCES [1] J. Ousterhout, Sheuling tehniques for onurrent systems, in International Conferene on Distriute Computing Systems, 1982, pp. 22 30.