Performance Analysis of Greedy Shapers in Real-Time Systems

Performance Analysis of Greedy Shapers in Real-Time Sysems Erneso Wandeler Alexander Maxiaguine Lohar Thiele Compuer Engineering and Neworks Laboraory Swiss Federal Insiue of Technology (ETH) 9 Zürich, Swizerland {wandeler,maxiagui,hiele}@ik.ee.ehz.ch Absrac Traffic shaping is a well-known echnique in he area of neworking and is proven o reduce global buffer requiremens and end-o-end delays in neworked sysems. Due o hese properies, shapers also play an increasingly imporan role in he design of muli-processor embedded sysems ha exhibi a considerable amoun of on-chip raffic. Despie heir growing imporance in his area, no mehods exis o analyze shapers in disribued embedded sysems, and o incorporae hem ino a sysem-level performance analysis. Hence i is unil now no possible o deermine he effec of shapers o end-o-end delay guaranees or buffer requiremens in hese sysems. In his work, we presen a mehod o analyze greedy shapers, and we embed his analysis mehod ino a well-esablished modular performance analysis framework. The presened approach enables sysem-level performance analysis of complee sysems wih greedy shapers, and we prove is applicabiliy by analyzing wo case sudy sysems. 1 Inroducion In he area of broad-band neworking, raffic shaping is a well-known and well-sudied echnique o regulae connecions and o avoid buffer overflow in nework nodes, see e.g. [3] or []. A raffic shaper in a nework node buffers he daa packes of an incoming raffic sream and delays hem such ha he oupu sream conforms o a given raffic specificaion. A shaper may ensure for example ha he oupu sream has limied bursiness, or ha packes on he oupu sream have a specified minimum iner-arrival ime. A greedy shaper is a special insance of a raffic shaper, ha no only ensures an oupu sream sream ha conforms o a given raffic specificaion, bu ha also guaranees ha no packes ge delayed any longer han necessary. By limiing he bursiness of he oupu sream of a nework node, shapers ypically drasically reduce he buffer requiremens on subsequen nework nodes. And if some sor of prioriy scheduling is used on a nework node o share bandwidh among several incoming sreams, hen a limied bursiness of high-prioriy sreams leads o beer responsiveness of lower-prioriy sreams. In addiion, under some circumsances, shaping comes for free from a performance poin of view. To be more specific, if he oupu sream of a node is shaped wih a greedy shaper o conform again o he inpu raffic specificaion, and if he buffer of he shaper accesses he same memory as he inpu buffer of he node, hen he end-o-end delay of he sream and he oal buffer requiremens on he nework node are no affeced by adding he shaper. Due o hese favorable properies, shapers also play an increasingly imporan role in he design of real-ime embedded sysems. Paricularly, since modern embedded sysems are ofen implemened as muli-processor sysems wih a considerable amoun of on-chip raffic. In his domain, we may idenify wo main applicaion areas for raffic shaping. Firs, shapers may be used inernally, o re-shape inernal raffic sreams o reduce global buffer requiremens and end-o-end delays, and secondly, shapers may be added a he boundaries of a sysem, o ensure conforman inpu sreams and o hereby preven inernal buffer overflows caused by malicious inpu. Figure 1 shows wo simple example sysems from hese wo applicaion areas. S 1 S Inernal Re-Shaping Shared BUS CPU 1 CNI S 1 1 T S1 σ S 1 1 CPU CNI S T S σ S S 1 S CNI 3 CNI Exernal Inpu-Shaping S 1 σ S 1 1 S σ S S 3 σ S 3 3 MPSoC CPU 1 S 1 T S1 T S S T S3 S 3 Figure 1. Two sysems wih greedy shapers. The analysis of raffic shapers in communicaion neworks is well-known []. Bu o our bes knowledge, none of he exising frameworks for modular sysem level per- 3-911--/DATE EDAA

formance analysis of real-ime embedded sysem considers raffic shapers a his ime, see e.g. [5], [7] or [1, 9]. Only [7] inroduces a resriced kind of raffic shaping hrough so-called even adapion funcions (EAF s). Bu EAF s play a crucial role in he fundamenal abiliy of [7] o analyze sysems, and a designer has herefore a very limied freedom o place or leave away, or o parameerize EAF s. In his work, we will exend he framework presened in [1, 9], o enable sysem level performance analysis of realime embedded sysems wih raffic shapers. I has o be noed here, ha in [], Le Boudec and Thiran challenge he abiliy of he mehods in [9] o analyze raffic shapers, and in [], Schiøler e al. even claim ha i is no possible o analyze raffic shapers wihin he framework of [1, 9]. Conribuions of his work: We presen a mehod o analyze greedy shapers in he area of muli-processor embedded sysems. We embed his new analysis mehod ino he well esablished modular performance analysis framework of [1, 9]. This enables sysem-level performance analysis of complee sysems wih greedy shapers, i.e. amongs ohers, we may analyze end-o-end delay guaranees and global buffer requiremens of such sysems. We prove he applicabiliy of he presened mehods by analyzing wo small case sudy sysems wih greedy shapers. Modular Performance Analysis In he domain of communicaion neworks, powerful absracions have been developed o model flow of daa hrough a nework. In paricular Nework Calculus [] provides means o deerminisically reason abou iming properies of daa flows in queuing neworks. Real-Time Calculus [9] exends he basic conceps of Nework Calculus o he domain of real-ime embedded sysems, and in [1] a unifying approach o Modular Performance Analysis wih Real-Time Calculus has been proposed. I is based on a general even and resource model, allows for hierarchical scheduling and arbiraion, and akes compuaion and communicaion resources ino accoun. Following, we inroduce some conceps of Nework and Real-Time Calculus..1 A General Even Sream Model A race of an even sream can be described by means of a cumulaive funcion R(), defined as he number of evens seen on he even sream in he ime inerval [,]. While any R always describes one concree race of an even sream, a uple α() = [α u (),α l ()] of upper and lower arrival curves [] provides an absrac even sream model, represening all possible races of an even sream. α u () provides an upper bound on he number of evens seen on he even sream in any ime inerval of lengh, and analogously, α l () denoes a lower bound on he number of evens in a ime inerval. R, α u and α l are relaed o each oher as follows: α l ( s) R() R(s) α u ( s) s < (1) wih α l () = α u () =. Arrival curves subsanially generalize radiional even models such as sporadic, periodic, periodic wih jier, or any oher arrival paern wih deerminisic iming behavior. For example an even sream wih a period p, a jier j, and a minimum iner-arrival disance d, can be modeled by he following arrival curves: j α l () = p ; α u () = min. A General Resource Model +j p, Analogously o he cumulaive funcion R(), he concree availabiliy of a compuaion or communicaion resource can be described by a cumulaive funcion C(), defined as he number of available resources, e.g. processor or bus cycles, in he ime inerval [,]. To provide an absrac resource model, we define a uple β() = [β u (),β l ()] of upper, β u, and lower, β l, service curves. Then, C, β u and β l are relaed o each oher as follows: β l ( s) C() C(s) β u ( s) s < (3) wih β l () = β u () =..3 From Componens o Absrac Componens In a real-ime sysem, an incoming even sream is ypically processed on a sequence of HW/SW componens, ha we will inerpre as asks ha are execued on possibly differen hardware resources. Concree Componen R() T C() C () R () d Absrac Componen α() β() β'() α'() Figure. A componen and is absracion. Figure shows on he lef side such a componen. An even sream R() eners he componen and is processed ()

using a hardware resource whose availabiliy is modeled by C(). Afer being processed, he evens are emied on he componen s oupu, resuling in an ougoing even sream R (), and he remaining resources ha were no consumed are made available o oher componens and are described by an ougoing resource availabiliy race C (). The relaions beween R(), C(), R () and C () depend on he componen s processing semanics, and he ougoing even sream R () will ypically no equal he incoming even sream R(), as i may, for example, exhibi more or less jier. Analogously, C () will differ from C(). For modular performance analysis wih real-ime calculus, we model such a HW/SW componen as an absrac componen as shown on he righ side of Fig.. Here, an absrac even sream α() eners he absrac componen and is processed using an absrac hardware resource β(). The oupu is hen again an absrac even sream α (), and he remaining resources are expressed again as an absrac hardware resource β (). Inernally, an absrac componen is specified by a se of relaions, ha relae he incoming arrival and service curves o he ougoing arrival and service curves: α = f α (α, β) β = f β (α, β) Again, hese relaions depend on he processing semanics of he modeled componen, and mus be deermined such ha α () and correcly models he even sream wih even race R () and ha β () correcly models he resource availabiliy C (). As an example, consider a componen modeling a ask ha greedily uses he resources offered o i. This componen can be described by he relaions f α as follows 1 [1]: α u = min{(α u β u ) β l,β u } () α l = min{(α l β u ) β l,β l } (5) Such a componen is very common in he area of real-ime embedded sysems, and we will refer o i as a Fixed Prioriy () componen.. Absrac Performance Models To analyze he performance of a concree sysem, we need o capure is essenial properies in an absrac performance model, ha consiss of a se of iner-conneced absrac componens. For his, firs all concree sysem componens are modeled using heir absrac represenaion (as described in he preceding secion). And hen, he arrivalcurve inpus and oupus of hese absrac componens are iner-conneced o reflec he flow even sreams hrough he sysem. 1 See he Appendix for a definiion of and When several componens of he concree sysem are allocaed o he same hardware resource, hey mus share his resource according o a scheduling policy. In he performance model, he scheduling policy on a resource can be expressed by he way he absrac resources β are disribued among he differen absrac componens. For example, consider preempive fixed prioriy scheduling: Absrac componen A wih he highes prioriy may use all available resources on a hardware, whereas absrac componen B wih he second highes prioriy only ges he resources ha were no consumed by A. This is modeled by using he service curves β A ha exi A as inpu o B. For some oher scheduling policies, such as GPS or TDMA, resources mus be disribued differenly, while for some scheduling policies, such as EDF or non-preempive scheduling, differen absrac componens, wih ailored inernal relaions, mus be used..5 Analysis In he performance model of a sysem, various performance measures can be compued analyically. For insance, for an componen he maximum delay d max experienced by an even is bounded by [, 1]: d max { sup inf{τ : α u (λ) β l (λ + τ)} } λ def = Del(α u,β l ) () and when processed by a sequence of componens, he oal end-o-end delay experienced by an even is bounded by []: d max Del(α u,β l 1 β l... β l n) (7) Similarly, he maximum buffer space b max required o buffer an even sream in fron of such an componen is bounded by: b max sup{α u (λ) β l (λ)} def = Buf(α u,β l ) () λ and when he buffers of consecuive componens access he same shared memory, he oal buffer space is bounded by: b max Buf(α u,β l 1 β l... β l n) (9) 3 Performance Analysis of Greedy Shapers To enable analysis of sysems wih greedy shapers in he Modular Performance Analysis framework, we need o inroduce a new absrac componen ha models a greedy shaper, as depiced in Fig. 3. We will firs explain he behavior and he implemenaion of concree greedy shapers, and will hen inroduce he inernal relaions for absrac greedy shapers. 3

Concree Greedy Shaper σ Absrac Greedy Shaper σ Proof: To prove (11) we use he fac ha R R is he minimum upper arrival curve of a cumulaive funcion R, and we use he properies R() σ R () α() GS α'() Figure 3. A greedy shaper and is absracion. 3.1 Concree Greedy Shapers A greedy shaper wih a shaping curve σ delays evens of an inpu even sream, so ha he oupu even sream has σ as an upper arrival curve, and i oupus all evens as soon as possible. Consider a greedy shaper wih shaping curve σ, which is sub-addiive and wih σ() =. Assume ha he shaper buffer is empy a ime, and ha i is large enough so ha here is no even loss. In [], Le Boudec and Thiran proved ha for an inpu even race R o such a greedy shaper, he oupu even race R is given by: R = R σ (1) In pracice, a greedy shaper wih a shaping curve σ() = min i {b i + r i } wih σ() = can be implemened using a cascade of leaky buckes. Every leaky bucke has a bucke size b i and a leaking rae r i, and he leaky buckes are arranged wih decreasing leaking rae wihin he cascade. Iniially all buckes are empy. When an even arrives a a leaky bucke sage, a oken is generaed. If here is enough space in he bucke, he oken is pu ino he bucke and he even is sen o he nex sage immediaely. Oherwise, he even is buffered unil he bucke empied enough o pu he oken in. 3. Absrac Greedy Shapers Theorem 1 (Absrac Greedy Shapers) Assume an even sream ha can be modeled as an absrac even sream wih arrival curves [α u,α l ] serves as inpu o a greedy shaper wih a sub-addiive shaping curve σ wih σ() =. Then, he oupu of he greedy shaper is an even sream ha can be modeled as an absrac even sream wih arrival curves αgs u = α u σ (11) αgs l = α l (σ σ) (1) Furher, he maximum delay and he maximum backlog a he greedy shaper are bounded by d max,gs = Del(α u,σ) (13) b max,gs = Buf(α u,σ) (1) (f g) h = f (g h) (f g) g f (g g) ha were proven in []. We can hen compue R R = (R σ) (R σ) = ((R σ) R) σ =((σ R) R) σ (σ (R R)) σ (σ α u ) σ =(α u σ) σ = α u σ To prove (1) we use he fac ha R R is he maximum lower arrival curve of a cumulaive funcion R. We can hen compue R R =(R σ) (R σ) = inf sup inf {R(u) R(v) +σ(µ + λ u) σ(λ v)} λ v λ v u v+µ When we separaely evaluae his formula for u v, for v u v + µ and for v + µ u λ + µ, we ge (R σ) (R σ) min{(r R) (σ σ),r R, σ σ} = (R R) (σ σ) The complee proofs for (13) and (1) are omied here due o space resricions, bu hey were deduced saring from he following relaions: d() = inf{τ :R() R ( + τ)} = inf{τ : inf {σ( + τ u) +R(u) R()}} u +τ b() = R() R () =R() (σ R)() = sup u {R() R(u) σ( u)} Relaions (11) and (1) can now be used as inernal relaions of an absrac greedy shaper, and (13) and (1) can be used o analyze delay guaranees and buffer requiremens of greedy shapers in a performance model. Applicaions & Case Sudies In his secion, we analyze he wo sysem designs depiced in Fig. 1. The analysis resuls will clearly reveal he posiive influence of greedy shapers o a sysem s performance and buffer requiremens when applied inernally, or o a sysem s robusness when applied exernally. We deliberaely chose wo small sysem designs ha clearly focus on he influence of he greedy shapers, and ha do no dilue he analysis resuls by any possibly hard recognizable influences of oher sysem properies. Modular Performance Analysis wih Real-Time Calculus was however already used several imes o analyze bigger and more complex sysem designs, and he absrac greedy shapers can seamlessly be inegraed ino bigger performance models.

.1 Inernal Shaping for Sysem Improvemen Consider a disribued real-ime sysem wih CPU s ha communicae via a shared bus, as depiced on he lef side in Fig. 1. CPU 1 and CPU boh process an incoming even sream S 1 and S, and send he resuling even sreams S 1 and S via he shared bus o oher componens. The shared bus implemens a fixed-prioriy proocol, where sending he evens from CPU 1 has prioriy over sending he evens from CPU. Evens ha are ready o be sen ge buffered in he communicaion nework inerfaces CNI 1 and CNI ha connec CPU 1 and CPU wih he shared bus. In his sysem, S 1 may differ considerably from S 1.For example S 1 may be bursy even when S 1 is a sricly periodic even sream. This may happen for example, if besides T S1, oher asks are execued on CPU 1 using a TDMA scheduling policy. Or also if scheduling is used and T S1 has a low prioriy. In boh cases, he processor may no be available o T S1 during some ime inerval in which all arriving evens of S 1 ge buffered, and i may be fully available o T S1 during a laer ime inerval in which all he buffered evens will be processed and emied, leading o a burs on S 1. Now suppose ha even sream S 1 is bursy. Whenever a burs of evens arrive on S 1, he shared bus ges fully occupied unil all buffered evens of S 1 are sen. During his period, even sream S will receive no service, and S will experience a delay caused by he bursiness of S 1. Moreover, also he buffer demand in CNI will increase wih increasing bursiness of S 1. In his sysem, i may be an ineresing opion o place a greedy shaper a he oupu of CPU 1, ha shapes even sream S 1. This greedy shaper will limi he bursiness of S 1, and will herefore reduce he influence of CPU 1 and S 1 o he delay of S and he buffer requiremens of CNI. To invesigae he effec of adding greedy shapers o he sysem wih inernal re-shaping in Fig. 1, we analyze i wih Modular Performance Analysis, using he absrac greedy shaper componen ha we inroduced in Secion 3. We assume ha S 1 and S are boh sricly periodic wih a period p =1ms. In boh CPU s, he CPU may no be available o process he asks T Si for up o 5ms. Afer his period of a mos 5ms, he processor is fully available and can process 5 evens per ms (βcpu1 u = βu CPU = 5[e/ms], βcpu1 l = βl CPU = max{, 5}[e/ms]). The bus can send.5 evens per ms (βbus u = βl BUS =.5[e/ms]). Wih his specificaion, we analyze four differen sysem designs. Firs, we analyze he sysem wihou greedy shapers, secondly, we place a greedy shaper only a he oupu of CPU 1 o shape S 1, hen, we place a greedy shaper only a he oupu of CPU o shape S, and finally we will add wo greedy shapers o shape boh S 1 as well as S.We use he upper arrival curves αs1 u and αu S as shaping curves σ 1 and σ, respecively, and we assume ha he buffers of he greedy shapers and he corresponding processing asks access he same memory. On he lef side of Fig., he absrac performance model of he fourh sysem design is depiced. α 1 Inernal Re-Shaping β CPU1 σ 1 α 1 GS β CPU σ α α GS β BUS α 1 α 1 α α Exernal Inpu-Shaping σ 1 β CPU α 1 α 1 α 1 GS σ α GS α α σ 3 α 3 GS α 3 α 3 Figure. Performance models. Using he four performance models, we analyzed he maximum required buffer spaces of he differen buffers, as well as he end-o-end delays of boh even sreams S 1 and S. The resuls are shown in Table 1. Table 1. Effec of Re-Shaping. shapers buffer delay CPU 1 CPU CNI 1 CNI To S1 S none 9 5 5. 9 S1 1 19 5 5. % - - 75% 33% % 7.% 3% S 5.. % - - - 5% % -.% boh 1 1 1 5 5. % - - 75% 9% % 7.% % From he resuls, we learn ha placing greedy shapers helps o reduce he oal buffer requiremens from 5 down o 1 evens ha need o be buffered a mos. Moreover, he greedy buffers also reduce he end-o-end delay of boh even sreams, namely by 7.% for S 1, and by a oal of % for S. When we look a he resuls, we also recognize he wellknown propery of greedy shapers ha re-shaping is for free []. Since we use σ 1 = αs1 u and σ = αs u, he greedy shapers effecively only re-shape S1 1 and S, and herefore he buffer requiremens of CPU 1 and CPU are no affeced by adding he greedy shapers.. Inpu-Shaping for Separaion of Concerns Typical large embedded sysems ofen process several even sreams in parallel. To achieve separaion of concerns in such sysems, hey are ofen implemened using ime-riggered scheduling policies, or servers. While hese 5

scheduling policies help o decouple he influence of he various even sreams o each oher, hey ofen do no use he available resources efficienly. On he oher hand, powerful mehods were developed o analyze sysems wih even-riggered scheduling policies, such as RM or EDF. In hese sysems, resources are used efficienly, bu on he downside, he various even sreams may heavily influence each oher. Sligh changes in he iming behavior of a high-prioriy sream may increase he oal delay of a lower-prioriy sream considerably, possibly leading o a missed deadline, or o buffer overflows somewhere in he sysem. To overcome his problem, greedy shapers may be placed a he inpu o such sysems. Every incoming even sream S i ges shaped wih an individual shaping curve σ i ha corresponds o is design-ime iming specificaion. The sysem can hen be analyzed using he design-ime iming specificaions, and a run-ime, non-adherence of S i o is iming specificaion will have no influence o he delay of any oher even sreams, bu will a mos increase he oal delay of S i iself. And moreover, no buffers will overflow inside he sysem. Insead, only he buffers of he greedy shapers hemselves may overflow. Bu since hese buffers are clearly localized a he boundary of he sysem, individual handling policies could easily be implemened. Les assume a real-ime sysem as shown on he righ side of Fig. 1. Here, a single CPU processes hree even sreams wih a fixed-prioriy scheduling policy. The highprioriy sream S 1 is sricly periodic wih p 1 =5ms, he medium-prioriy sream S is sricly periodic wih p = 1ms, and he low-prioriy sream S 3 is sricly periodic wih p 3 =ms. The CPU processes.35 evens per ms. To illusrae he influence of greedy shapers a he inpu of such a sysem, we add a jier of j 1 =.1ms o sream S 1, and we hen analyze he effec of his o he end-o-end delays of he hree even sreams, boh wih and wihou greedy shapers. The resuls, compued using Modular Performance Analysis are shown in Table. Table. Effec of Inpu-Shaping. Wihou Shaping Wih Shaping d 1 d d 3 d 1 d d 3 j 1 =..57..57 j 1 =.1..57.57.9.57 % +3% +3.5% Looking a he resuls, we clearly see he big influence of he lile non-adherence of S 1 o he maximum delay of he compleely independen sream S 3, if no inpu shaping is applied. On he oher hand, we observe ha inpu shaping effecively isolaes he influence of he malicious inpu sream S 1 o he oher presen even sreams. Now, only S 1 is affeced from is own malbehavior. 5 Conclusions We inroduced a new mehod o analyze greedy shapers, and we embedded his mehod ino he modular performance analysis framework of [1, 9], by inroducing a new absrac componen ha models a greedy shaper. This approach enables sysem level performance analysis of realime sysems wih greedy shapers. We proved he applicabiliy of he presened mehods hrough performance analysis of wo case sudy sysems wih greedy shapers. In hese case sudy sysems, we could analyze he deailed buffer requiremens of all sysem componens, and we could provide end-o-end delay guaranees for he processed even sreams. The analysis hereby clearly revealed he posiive influence of greedy shapers o he sysem s performance and buffer requiremens. Appendix: Min-Max Algebra The operaors, and are defined as: (f g)() = inf {f( λ) +g(λ)} λ (15) (f g)() = sup{f( + λ) g(λ)} (1) λ (f g)() = inf {f( + λ) g(λ)} λ (17) A curve σ is sub-addiive, if Acknowledgemens σ(a) +σ(b) σ(a + b) a, b (1) This research has been funded by he Swiss Naional Science Foundaion (SNF) under he Analyic Performance Esimaion of Embedded Compuer Sysems projec 1-135/1, and by ARTIST. References [1] S. Chakrabory, S. Künzli, and L. Thiele. A general framework for analysing sysem properies in plaform-based embedded sysem designs. In Proc. h Design, Auomaion and Tes in Europe (DATE), pages 19 195, March 3. [] R. Cruz. A calculus for nework delay. IEEE Trans. Informaion Theory, 37(1):11 11, 1991. [3] S. Gringeri, K. Shuaib, R. Egorov, A. Lewis, B. Khasnabish, and B. Basch. Traffic shaping, bandwidh allocaion, and qualiy assessmen for mpeg video disribuion over broadband neworks. IEEE Neworks, 1():9 17, 199. [] J. Le Boudec and P. Thiran. Nework Calculus - A Theory of Deerminisic Queuing Sysems for he Inerne. LNCS 5, Springer Verlag, 1. [5] P. Pop, P. Eles, and Z. Peng. Schedulabiliy Analysis and Opimizaion for he Synhesis of Muli-Cluser Disribued Embedded Sysems. In Design, Auomaion and Tes in Europe (DATE 3), pages 1 19, 3. [] J. Rexford, F. Bonomi, A. Greenberg, and A. Wong. Scalable archiecures for inegraed raffic shaping and link scheduling in high-speed ATM swiches. IEEE Journal on Seleced Areas in Communicaions, 15(5):93 95, 1997. [7] K. Richer, M. Jersak, and R. Erns. A formal approach o mpsoc performance verificaion. IEEE Compuer, 3(): 7, April 3. [] H. Schioler, J. Jessen, J. D. Nielsen, and K. G. Larsen. CyNC - owards a general ool for performance analysis of complex disribued real-ime sysems. In Proceedings of he WiP Session of he 17h EUROMICRO Conference on Real-Time Sysems (ECRTS 5), pages 1. IEEE, 5. [9] L. Thiele, S. Chakrabory, and M. Naedele. Real-ime calculus for scheduling hard real-ime sysems. In Proc. IEEE Inernaional Symposium on Circuis and Sysems (ISCAS), volume, pages 11 1,.