This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng wth colleagues. Other uses, ncludng reproducton and dstrbuton, or sellng or lcensng copes, or postng to personal, nsttutonal or thrd party webstes are prohbted. In most cases authors are permtted to post ther verson of the artcle (e.g. n Word or Tex form) to ther personal webste or nsttutonal repostory. Authors requrng further nformaton regardng Elsever s archvng and manuscrpt polces are encouraged to vst: http://www.elsever.com/copyrght

Mcroprocessors and Mcrosystems 32 (2008) 413 424 Contents lsts avalable at ScenceDrect Mcroprocessors and Mcrosystems journal homepage: www.elsever.com/locate/mcpro FCS/nORB: A feedback control real-tme schedulng servce for embedded ORB mddleware q Xaoru Wang a, *, Chenyang Lu b, Chrstopher Gll b a Department of Electrcal Engneerng and Computer Scence, Unversty of Tennessee, Knoxvlle, TN 37996, USA b Department of Computer Scence and Engneerng, Washngton Unversty n St. Lous, MO 63117, USA artcle nfo abstract Artcle hstory: Avalable onlne 23 May 2008 Keywords: Feedback control Real-tme schedulng Object request broker Performance portablty Mddleware Object Request Broker (ORB) mddleware has shown promse n meetng the functonal and real-tme performance requrements of dstrbuted real-tme and embedded (DRE) systems. However, exstng real-tme ORB mddleware standards such as RT-CORBA do not adequately address the challenges of (1) managng unpredctable workload, and (2) provdng robust performance guarantees portably across dfferent platforms. To overcome ths lmtaton, we have developed software called FCS/nORB that ntegrates a Feedback Control real-tme Schedulng (FCS) servce wth the norb small-footprnt real-tme ORB desgned for networked embedded systems. FCS/nORB features feedback control loops that provde real-tme performance guarantees by automatcally adjustng the rate of remote method nvocatons transparently to an applcaton. FCS/nORB thus enables real-tme applcatons to be truly portable n terms of real-tme performance as well as functonalty, wthout the need for hand tunng. Ths paper presents the desgn, mplementaton, and emprcal evaluaton of FCS/nORB. Our extensve experments on a Lnux testbed demonstrate that FCS/nORB can provde deadlne mss rato and utlzaton guarantees n the face of changes n platform and task executon tmes, whle ntroducng only a small amount of overhead. Ó 2008 Elsever B.V. All rghts reserved. 1. Introducton Object Request Broker (ORB) mddleware [31,33] has shown promse n meetng the functonal and real-tme performance requrements of dstrbuted real-tme and embedded (DRE) systems bult usng common-off-the-shelf (COTS) hardware and software. DRE systems such as avoncs msson computng [8], unmanned flght control systems [2], and autonomous aeral survellance [23] ncreasngly rely on real-tme ORB mddleware to meet challengng requrements such as communcaton and processng tmelness among dstrbuted applcaton components. Several knds of mddleware are emergng as fundamental buldng blocks for these knds of systems. Low-level frameworks such as ACE [30] provde portablty across dfferent operatng systems and hardware platforms. Resource management frameworks such as Kokyu [10] use low-level elements to confgure schedulng and dspatchng mechansms n hgher-level mddleware. Real- Tme ORBs such as TAO [31] and norb [33] are geared toward provdng predctable tmng of end-to-end method nvocatons. ORB servces such as the TAO Real-Tme Event Servce [12] and TAO q Ths paper s a substantally extended verson of a conference paper [25]. * Correspondng author. Tel.: +1 865 974 0627; fax: +1 865 974 5483. E-mal addresses: xwang@eecs.utk.edu (X. Wang), lu@cse.wustl.edu (C. Lu), cdgll@cse.wustl.edu (C. Gll). Schedulng Servce [10] offer hgher-level servces for managng functonal and real-tme propertes of nteractons between applcaton components. Fnally, hgher-level mddleware servces [13,29,39,41] provde ntegraton of real-tme resource management n complex vertcally layered DRE applcatons. However, before t can fully delver ts promse, ORB mddleware stll faces two key challenges. Handle unpredctable workloads: The task executon tmes and resource requrements of many DRE applcatons are unknown a pror or may vary sgnfcantly at run-tme often because ther executons are strongly nfluenced by the operatng envronment. For example, the executon tme of a vsual trackng task may vary dramatcally as a functon of the number and locaton of potental targets n a set of camera mages sent to t. Provde real-tme performance portablty: A key advantage of mddleware s supportng portablty across dfferent OS and hardware platforms. However, although the functonalty of applcatons runnng ORB mddleware s readly portable, realtme performance can dffer sgnfcantly across dfferent platforms and ORBs. Consequently, an applcaton that meets all of ts tmng constrants on a partcular platform may volate the same constrants on another platform. Sgnfcant tme and cost must then be ncurred to test and re-tune an applcaton for each platform on whch t s deployed. Hence, DRE applcatons are 0141-9331/$ - see front matter Ó 2008 Elsever B.V. All rghts reserved. do:116/j.mcpro.2008.05.002

414 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 not strctly portable even when developed usng today s ORB mddleware. The lack of robust real-tme performance portablty thus detracts from the benefts of deployng DRE applcatons on current-generaton ORB mddleware. A key reason that exstng ORB mddleware cannot deal wth the above challenges s that common schedulng approaches are based on open-loop algorthms, e.g., Rate Monotonc Schedulng (RMS) or Earlest Deadlne Frst (EDF) [20], whch conduct schedulablty analyss based on accurate knowledge of task executon tmes to provde real-tme performance guarantees. However, when workloads and avalable platform resources are varable or smply not known a pror, open-loop schedulng algorthms ether result n extremely underutlzed systems based on pessmstc worst-case estmaton, or n systems that fal when workloads or platform characterstcs vary sgnfcantly from desgn-tme expectatons. Our soluton s to ntegrate a Feedback Control real-tme Schedulng (FCS) framework [24] wth real-tme embedded ORB mddleware [33], to provde portable real-tme performance and robust handlng of unpredctable workloads. In contrast to earler research on real-tme schedulng that was concerned wth statcally assured avodance of undesrable effects such as overload and deadlne msses, FCS algorthms are desgned to handle such effects dynamcally based on perodc performance feedback. More mportantly, FCS algorthms offer an analytc framework to provde soft real-tme performance guarantees wthout underutlzng the system, even when the task executon tmes are unknown or vary sgnfcantly at run-tme. Whle FCS algorthms have been prevously analyzed and evaluated through just smulatons, ths paper presents: the desgn and mplementaton of an FCS feedback control loop n embedded ORB mddleware, and the detaled emprcal performance evaluaton of FCS schedulng servce wth realstc workloads. FCS/nORB provdes key schedulng support that makes DRE software performance (1) more robust aganst workload varatons when tasks have negotable QoS parameters that can be adjusted and (2) portable across OS and hardware platforms. The FCS servce we have mplemented n ths work automatcally adjusts the rates of method nvocatons on remote applcaton objects, based on measured performance feedback. Our choce of ths adaptaton mechansm s motvated by the fact that n many DRE applcatons, e.g., dgtal feedback control loops [9,32], sensor data dsplay, and vdeo streamng [5], task rates can be adjusted on-lne wthout causng nstablty or system falure. Other QoS adaptaton mechansms such as onlne task admsson control can also be ncorporated easly nto the FCS/nORB mddleware system. Specfcally, ths paper makes three man contrbutons to research on DRE systems: desgn documentaton of an FCS servce at the ORB mddleware layer, that provdes real-tme performance portablty and robust performance guarantees n face of workload varatons, mplementaton of a feedback control loop and all the components (e.g., montor, controller, etc.) of the control loop n an embedded ORB mddleware that dynamcally adjusts the rates of remote method nvocatons transparently to the applcaton (subject to applcaton-specfed constrants), and results of emprcal performance evaluatons on a physcal testbed that demonstrate the effcency, robustness and lmtatons of applyng FCS at the ORB mddleware layer. The rest of ths paper s structured as follows. We frst descrbe the desgn and mplementaton of the FCS/nORB mddleware n Secton 2. We then present results of our emprcal performance evaluaton on a Lnux testbed wth both synthetc workload and realstc workload n Secton 3. Secton 4 surveys related work n the areas of real-tme schedulng, software performance control, and adaptve resource management n mddleware. Fnally, Secton 5 summarzes the contrbutons of ths paper. 2. FCS/nORB desgn In ths secton, we frst present the applcaton model and archtecture desgn of FCS/nORB. We then descrbe the feedback control loop nstantated n FCS/nORB, and ntroduce the mplementaton detals of each component n the control loop. We also brefly revew the FCS algorthms prevously proposed n [24] as part of the controller component, and gve the mplementaton nformaton of the FCS/nORB system. 2.1. Applcaton model We now descrbe the applcaton model adopted by FCS/nORB. Wth ORB mddleware, applcatons typcally execute usng method nvocatons on objects dstrbuted across multple end systems. Invocaton latency for remote methods ncludes latency on the clent, the server, and the communcaton network. Each method nvocaton may be subject to an end-to-end deadlne. An establshed approach for handlng tmelness of remote method nvocatons s through end-to-end schedulng [21]. In ths approach, an end-to-end deadlne s dvded nto ntermedate deadlnes on the server, clent, and communcaton network, and the problem of meetng the end-to-end deadlne s thus transformed nto the problem of meetng every ntermedate deadlne. In ths paper, we focus on the problem of meetng ntermedate deadlnes on a sngle processor, the server. To precsely measure the overhead of FCS on the clent sde and the server sde, respectvely, we run clent and server on two dfferent processors. Such a confguraton s common n networked dgtal control applcatons that run multple control algorthms on a server processor that nteracts wth several other clent processors attached to sensors and actuators. Communcaton delay s not the focus of ths paper, although t s possble to treat a network smlarly as a processor n an end-to-end schedulng model. In the rest of ths paper, we use the term task to refer to the executon of a remote method on the server. We assume that each task T has an estmated executon tme EE known at desgn-tme. However, the actual executon tme of a task may be sgnfcantly dfferent from EE and may vary at run tme. We also assume that the rate of T can be dynamcally adjusted wthn a range ½R mn ; R max Š. Earler research has shown that task rates n many real-tme applcatons (e.g., dgtal feedback control [7], sensor data update, and multmeda [5,6]) can be adjusted wthout causng applcaton falure. Specfcally, each task T s descrbed by the followng three attrbutes: EE : the estmated executon tme, ½R mn ; R max Š: the range of acceptable rates, and R (k): the rate n the kth control perod. R mn < R ðkþ < R max. We use X(k) to represent the value of a varable X n a control perod [(k 1)W, (kw), where k > 1 and W s the control perod length, whch s selected so that multple nstances of each task may be released durng a control perod. We assume all tasks are perodc, 1 and each task T s (relatve) deadlne on the server, D (k), s proportonal to ts perod. 1 The restrctons can be released to handle aperodc tasks [24].

X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 415 A key property of our task model s that t does not requre accurate knowledge of task executon tmes. The executon tme of a task may be sgnfcantly dfferent from ts estmate and may vary at run-tme. 2.2. FCS/nORB archtecture desgn FCS/nORB s developed based on open-source embedded mddleware norb [33], whch s a lght-weght real-tme ORB desgned to support networked embedded systems. Both FCS/nORB and the norb are based on ACE [30], an open-source object-orented (OO) framework that mplements many core concurrent communcaton patterns for platform-ndependent dstrbuted software. 2.2.1. Lane-based archtecture In FCS/nORB, for each prorty level that s used for remote method nvocaton requests, a lane [29] s establshed between the server and a clent. A lane s composed of three parts: the server sde part, the clent sde part, and a separate TCP connecton between the server and the clent to avod prorty nverson at the communcaton layer. As shown n Fg. 1, the clent part of a lane has a par of tmer thread and connecton thread that are connected through a FIFO queue. Each par of tmer/connecton threads s assgned a prorty and submts method nvocaton requests to the server at ths prorty. Each tmer thread s assocated wth a tmer that generates perodc tmeouts, to ntate method nvocaton requests at a specfed rate. Smlarly, the server part of a lane has a par of worker and connecton threads, connected through an ntermedate FIFO queue. Each par of worker/connecton threads s assgned a prorty and s responsble for processng method nvocaton requests at that prorty. Connecton threads receve method nvocaton requests from multple clents and then add the requests to the queue. The worker threads remove requests from the queue, nvoke the correspondng methods, and send the results back to clents. The separaton of worker threads and connecton threads smplfes applcaton programmng by avodng the need of handlng asynchronous communcaton. We apply the RMS polcy [20] to assgn task prortes to the thread pars on the server. Each thread par on the clent shares the same prorty as the thread par on the server that t s connected to. A key contrbuton of ths work s to show that feedback control can be realzed n reduced-feature-set ORBs such as norb that are talored to ft wthn the space and power lmtatons seen n many networked embedded systems, wthout sacrfcng realtme performance. 2.2.2. Prorty management The mplementaton of feedback control ntroduces new challenges to the desgn of schedulng mechansms n ORB mddleware. For nstance, the rate adaptaton mechansm adopted by FCS/nORB and several other projects [36,38] may dynamcally change the rates of real-tme tasks. When task prortes are determned by task rates (e.g. n the Rate Monotonc Schedulng (RMS) polcy), ths may cause the mddleware to contnuously change the prortes of all ts tasks. To satsfy the specal requrements posed by rate adaptaton, FCS/nORB s confgured wth the prorty-perlaneconcurrency archtecture. Tradtonally, a concurrency archtecture called thread-per-prorty has been adopted n most exstng DRE mddleware (e.g., [31]), nclude norb. In that model, the same thread s responsble for executng all tasks wth the same prorty. Ths s because the workload s assumed to use only a lmted number of fxed task rates. However, ths concurrency archtecture s not sutable for rate adaptaton. Due to rate adaptaton, the rates and thus the prortes of real-tme tasks vary dynamcally at run-tme. In such a stuaton, the thread-per-prorty archtecture would requre the ORB to dynamcally move a subtask from one thread to another thread whch may ntroduce sgnfcant overhead. To avod ths problem FCS/nORB mplements the prorty-perlane concurrency archtecture that executes each task n a separate lane wth a separate prorty. FCS/nORB adjusts the prortes of the lanes, and thus the prortes of the threads n the lanes, only when the order of the task rates s changed. Whle the task rates may vary at every control perod, the order of task rates often changes at a much lower frequency, especally when FCS/nORB adopts the proportonal rate adjustment polcy, whch s ntroduced n Secton 2.3. Therefore, the prorty-per-lane archtecture enables FCS/nORB to adapt task rates n a more flexble way, wth less overhead. A potental advantage of the thread-per-prorty archtecture s that t may need fewer threads to execute applcatons when multple tasks share the same rate (.e., prorty). However, as FCS/ norb s targeted at memory-constraned networked embedded systems that commonly have lmted number of tasks on a processor, each task can be easly mapped to a lane wth a unque natve thread prorty even n a prorty-per-lane archtecture. 2.3. FCS/nORB feedback control loop The core of FCS/nORB s a feedback control loop that perodcally montors and controls ts controlled varables by adjustng QoS parameters (e.g., task rates or servce levels). Canddate controlled varables nclude the total (CPU) utlzaton and the Server worker thread FCS mss montor utl montor controller rate assgner FCS rate modulator Clent Tmer thread conn. thread conn. thread feedback lane RMI lanes Fg. 1. The Archtecture of FCS/nORB.

416 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 (deadlne) mss rato. Theutlzaton,, s defned as the fracton of tme when the CPU s busy n the kth control perod. The mss rato,, s the number of deadlnes mssed dvded by the total number of completed tasks 2 n the kth control perod. Performance references represent the desred values of the controlled varables,.e., the desred mss rato M s or the desred utlzaton U s. For example, a partcular system may requre a mss rato M s = 1.5% or a utlzaton U s = 70%. The goal of the feedback control loop s to enforce the performance references specfed by the applcaton, va runtme QoS adaptaton. The feedback control loop of FCS/nORB on the server sde ncludes a utlzaton montor, a mss rato montor, a controller, a rate allocator, and a par of FCS/connecton threads. The control loop on a clent ncludes a rate modulator and a par of FCS/connecton threads. All FCS/connecton threads n the FCS servce are assgned the hghest prorty so that the feedback control loop can run n overload condtons, when t s needed most. The FCS/connecton threads on the server are connected wth each clent connecton thread through a TCP connecton we call a feedback lane. We now present the detals of each component. 2.3.1. Utlzaton montor The utlzaton montor uses the /proc/stat fle n Lnux to estmate the CPU utlzaton n each control perod. The /proc/stat fle records the number of jffes (each 1/100 of a second) snce the system start tme, when the CPU s n user mode, user mode wth low prorty (nce), system mode, and when used by the dle task. At the end of each control perod, the utlzaton montor reads the counters, and estmates CPU utlzaton by dvdng the number of jffes used by the dle task n the last control perod by the total number of jffes n the same perod. We note that the same technque s used by the benchmarkng tool, NetPerf [28]. 2.3.2. Deadlne mss montor The deadlne mss montor measures the percentage of completed tasks that mss ther deadlnes on the server n each control perod. FCS/nORB mantans two counters for each par of connecton/worker threads on the server. One counter records the number of completed tasks n the current control perod, and the other records the number of tasks that mssed ther deadlnes n the same perod. Each connecton thread tmestamps every method nvocaton request when t arrves from ts norb lane. The worker thread checks whether a completed task has mssed ts deadlne and updates the counters after t sends the nvocaton result to the clent. At the end of each control perod, the deadlne mss montor aggregates the counters of all worker/connecton threads, and computes the deadlne mss rato n the control perod. Note that FCS/nORB mantans separate counters for each par of connecton/worker threads nstead of shared global counters, to reduce contenton among threads updatng the counters. Ths use of thread-specfc storage s mportant because contenton among worker threads could ether allow prorty nversons or ntroduce unnecessary overhead to prevent them. 2.3.3. Controller The controller mplements the three control algorthms presented n Secton 2.4. Each tme ts perodcally scheduled tmer fres, the controller nvokes the utlzaton and/or deadlne mss montors, computes the total estmated utlzaton for the next control perod, and then nvokes the rate assgner. 2 When a task has a frm deadlne, t may be aborted when t msses ts deadlne. An aborted task s counted as a completed one and a deadlne mss for mss rato calculaton. 2.3.4. Rate assgner The rate assgner on the server and the rate modulator on ts clents together serve as actuators n the feedback control loop. The rate assgner computes the new task rates to enforce the total estmated utlzaton computed by the controller. Dfferent polces can be appled to assgn task rates. Our rate assgner currently mplements a smple polcy that s called Proportonal Rate Adjustment (PRA) n ths paper. Assumng that the ntal rate of task T s R (0), the ntal total estmated utlzaton Bð0Þ ¼ P ðee R ð0þþ, and the total estmated utlzaton for the followng (kth) control perod s B(k), the PRA polcy assgns the new rate to taskt as follows: R (k)=(b(k)/b(0))r (0). If R (k) falls outsde ts acceptable range ½R mn ; R max Š, t s rounded to the closer lmt. It can be easly proven that PRA enforces the total estmated utlzaton,.e., BðkÞ ¼ P (EE R (k)), f no task rates reach ther lower or upper lmts. The PRA polcy treats all the tasks farly n the sense that the relatve rates among tasks always reman the same f no tasks reach ther rate lmts. When an applcaton runs on a faster platform, the rates of all tasks wll be ncreased proportonally, whle on a slower platform, the rates of all tasks wll be decreased proportonally. A sde effect of the PRA polcy s that prortes of tasks wll not change at run-tme under RMS because the relatve order of task rates remans the same. Ths reduces overhead on the clents because they do not need to change task deadlnes on the fly. However, snce PRA potentally changes the rate of every task n each control perod, t may ntroduce relatvely hgh overhead for resettng all the tmers on the clents. Fortunately, as shown n our measurement, such overhead s small when ACE tmers are used. Note that the PRA polcy s based on the assumpton that all tasks are equally mportant. When ths assumpton s not true, the rate assgner needs to optmze the total system value under the constrant of the total estmated utlzaton. Although the value optmzaton problem s not a focus of ths study, exstng optmzaton algorthms, e.g., [17], could be used n the rate assgner to address ths problem. 2.3.5. Rate modulator A rate modulator s located on each clent. It receves the new rates for ts remote method nvocaton requests from the server sde rate assgner through the feedback lane, and resets the nterval of the tmer threads whose request rates have been changed. 2.4. FCS control algorthms The FCS/nORB controller mplements three FCS algorthms developed based on the choce of dfferent sets of these controlled varables [24]. The FC-U and FC-M algorthms each control the utlzaton and the mss rato, respectvely, and the FC-UM algorthm controls both and at the same tme. The utlzaton and mss rato montors measure the controlled varables, and, respectvely. At the end of each control perod, the Controller compares the controlled varable wth ts correspondng performance reference (U s or M s ), and computes B(k+1), the total estmated utlzaton for the subsequent control perod. The QoS Actuators then adjust tasks QoS parameters to enforce the total estmated utlzaton on the server. For example, a rate actuator assgns a new set of task rates such that,.e., Bðk þ 1Þ ¼ P ðee R ðk þ 1ÞÞ, and nstructs each clent to adjust ts nvocaton rate accordngly. Other examples of QoS actuaton mechansms nclude admsson control and adaptaton technques based on the mprecse computaton model [22]. It s mportant to note that B(k) may be dfferent from due to the dfference between the estmated and actual task executon tmes. We now brefly revew the three FCS algorthms presented n our earler work [24] to provde a complete understandng of the controller component. The key contrbutons of ths paper le n

X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 417 Fg. 2. Pseudo-code of FC-U. the archtecture desgn and mplementaton of the whole feedback control loop n embedded ORB mddleware and the extensve emprcal evaluaton to test ts effcency and lmtatons. 2.4.1. FC-U FC-U embodes a feedback loop to enforce a specfed utlzaton. Pseudo code for the FC-U algorthm s shown n Fg. 2. FC-U s approprate for systems wth a known (schedulable) utlzaton bound. In such systems, FC-U can guarantee a zero mss rato n steady-state f U s s lower than the utlzaton bound. However, FC-U s not applcable for a system whose utlzaton bound s unknown, or may underutlze the system when ts utlzaton bound s hghly pessmstc. 2.4.2. FC-M Unlke FC-U, whch controls the mss rato ndrectly through utlzaton control, FC-M utlzes a feedback loop to control the mss rato drectly. Pseudo-code for the FC-M algorthm s shown n Fg. 3. Compared wth FC-U, the advantage of FC-M s that t does not depend on any knowledge about the utlzaton bound. It may also acheve a hgher CPU utlzaton than FC-U, whose utlzaton reference (based on a theoretcal utlzaton bound) s often pessmstc. However, because the mss rato does not ndcate the extent of underutlzaton when =0, FC-M must have a postve mss rato reference (.e., M s > 0). Consequently, t wll have a small but non-zero mss rato even n steady-state [24]. Therefore, FC-M s only applcable to soft real-tme systems that can tolerate sporadc deadlne msses n steady-state. 2.4.3. FC-UM FC-UM ntegrates mss rato control and utlzaton control to combne ther advantages. Pseudo-code for the FC-UM algorthm s shown n Fg. 4. The advantage of FC-U s ts ablty to meet all deadlnes ( = 0) n steady-states f the utlzaton reference s lower than the utlzaton bound. The advantage of FC-M s that t can acheve a low (but non-zero) mss rato and hgher utlzaton even when the utlzaton bound s unknown or pessmstc. Through ntegrated control, FC-UM ams to acheve the advantages of both the FC-U and FC-M algorthms. In a system wth a FC-UM scheduler, the system admnstrator can smply set the utlzaton reference U s to a value that causes no deadlne msses n the nomnal case (e.g., based on system proflng or experence), and set the mss rato reference M s accordng to the applcaton s mss rato requrement. FC-UM can guarantee zero deadlne msses n the nomnal case whle also guaranteeng that the mss rato stays close to M s even f the utlzaton reference becomes lower than the (unknown) utlzaton bound of the system. Based on control theory, the FCS algorthms can be proven to have the followng control propertes. Frst, the system stablty s guaranteed as long as the actual task executon tmes wthn a specfed range from (several tmes greater or smaller than) ther estmated values. Second, the performance references (mss rato or utlzaton) can be precsely acheved n the steady-state as long as the system s wthn ts stablty range, even when task executon tmes vary at run-tme. Fnally, the parameters K u and K m can be determned based on a trade-off between the stablty range and the system settlng tme. Small values of K u and K m may cause Fg. 3. Pseudo-code for FC-M.

418 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 Fg. 4. Pseudo-code for FC-UM. long settlng tme for the system to acheve the desred performance whle large values may affect the system stablty. The detaled proofs are not shown due to space lmtatons but avalable n [24]. 2.5. Implementaton FCS/nORB 1.0 s mplemented n C++ usng ACE 5.2.7 on Lnux. The entre FCS/nORB mddleware (excludng the code n the ACE lbrary and IDL compler lbrary) s mplemented n 7898 lnes of C++ code compared to 4586 lnes of code n the orgnal norb. Both norb and FCS/nORB are open-source software and can be downloaded from norb: http://deuce.doc.wustl.edu/norb/ FCS/nORB: http://www.ece.utk.edu/~xwang/rtes/fcs_norb/ 3. Emprcal evaluatons In ths secton, we present the results of four sets of experments we ran on a Lnux testbed. Experment I evaluated the performance portablty of applcatons on FCS/nORB on two dfferent server platforms. On both platforms, we frst ran the same synthetc workload for whch the actual task executon tmes sgnfcantly devate from ther estmated executon tmes (the same estmates were used n all experments). Experment II stress-tested FCS/nORB s ablty to provde robust performance guarantees wth a workload whose task executon tmes vared dramatcally at run-tme. Experment III adopted an mage matchng workload that s representatve of target locaton applcatons, to examne FCS/nORB s robust performance guarantees n realstc envronment. Fnally, Experment IV measured the overhead ntroduced by the FCS control servce from three dfferent perspectves. 3.1. Expermental set-up 3.1.1. Platform We performed our experments on three PCs named Server A, Server B, and clent. Server A and clent were Dell 1.8 GHz Celeron PCs, each wth 512 MB of RAM. Server A and clent were drectly connected wth a 100 Mbps crossover Ethernet cable. They both ran Red Hat Lnux release 7.3 (Kernel 2.4.19). Server B was a Dell 1.99 GHz Pentum4 PC wth 256 MB of RAM. Server B and clent were connected through our departmental 100 Mbps LAN. Server B ran Red Hat Lnux release 7.3 (Kernel 2.4.18). Server A and Server B served as servers n separate experments, whle clent served as the only clent host n all experments. 3.1.2. Workload To evaluate the robustness of FCS/nORB, we used both a synthetc workload and a more realstc one that smulated real applcatons n our experments. Snce we focused on unpredctable workload and platform portablty, the estmated executon tmes were dfferent from the actual executon tmes n all experments. The same estmated executon tmes were used n all experments despte the fact that they used dfferent platforms and as a result had dfferent actual task executon tmes. Wth FCS, re-proflng of task executon tmes was not needed to provde performance guarantees. The synthetc workload comprsed 12 tasks. Each task perodcally nvoked one of three methods (shown n Table 1) of an applcaton object. All the tasks nvokng the same method shared the same maxmum rate, but ther mnmum rates were randomly chosen from a range lsted n the mn rate column n Table 1. The realstc workload comprsed of an avonc task set and an addtonal target locaton task. The avoncs task set s based on an F-16 smulator presented n [1]. It ncludes four separate tasks (gude, control, and slow and fast navgaton) wth dfferent rate ranges and executon tmes as shown n Table 2. We chose these tasks n our workload because ther rate ranges are avalable [1]. 3 The addtonal target locaton task s ncluded because of ts relatvely hgh computng ntensty and ts potental executon tme varaton n the runtme. Those two propertes make the smulated avonc system suffer a runtme performance varaton, whch provdes a typcal platform for FCS to apply. A common soluton for target locaton ncludes a seres of steps ncludng mage restoraton and enhancement, geometrc correcton, mage matchng, etc. For the sake of smplcty, we mplemented only the most crtcal step, mage matchng [27], n our experment. The goal of mage matchng s to search nput mages (perodcally captured by camera equpments) for a target, whch s represented by another smaller szed template mage. Specfcally, every pxel n the nput mage s potentally part of where the target s located so they all are canddate ponts. All those canddate ponts wll be checked exhaustvely for ther smlarty values wth the target template (some advanced mage matchng algorthms only check a subset of all pxel postons). At each of 3 We acknowledge that FCS/nORB may not be drectly applcable to safety-crtcal real-tme systems such as flght control for manned arcraft, whch requre hard performance guarantees.

X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 419 Table 1 Methods nvoked by the workload Method Estmated executon tme (ms) Mn rate Max rate 1 8.4 [1.1,2.1] 35 6 2 1.2 [1.3,1.9] 50 2 3 7.0 [1.2,2.2] 40 4 Number of nvokng tasks Table 2 Task sets n real mage matchng workload Task Est. executon tme (ms) Mn rate Max rate Gude 100 0.2 1.0 Control 80 1.0 5.0 Slow navgaton 100 0.2 1.0 Fast navgaton 60 1.0 5.0 Target locaton 150 0.2 5.0 Fg. 5. Images used n Experment III. the canddate ponts, a canddate regon wth the same sze as target template s extracted from the nput mage to compare wth target template pxel by pxel. All the ndvdual smlarty values from each of the correspondng pxel pars are summed up as an overall smlarty value for ths canddate pont. The canddate pont wth largest smlarty wll be dentfed as the match place, so long as ts smlarty value s larger than a pre-defned threshold. A target s consdered to have been located when ts match place s found. In our experment, absolute dfference (AD) [27] s used to compute the smlarty. The applcaton scenaro for our experment s as follows. Before the target object of nterest s located, the mage matchng task searches the full nput mage for a match wth the template. After an object s found at a partcular locaton, a focus regon s shrunk from the full mage to a small regon that s centered at the known locaton of the object n subsequent mages to save CPU cycles for other tasks. However, n some cases a fast movng object may escape from the focus regon between two consecutve nvocatons of the task, resultng n the loss of the object template. In ths stuaton, the full mage must be searched agan to relocate the target. Therefore, n our scenaro the executon tme for the mage matchng task starts at a hgh level n the begnnng, then drops to a lowlevel when the target has been detected. After ths target s lost, the executon tme returns agan to ts ntal hgh level. The varaton n the executon tme s unknown a pror because t depends whether the target s beng found on the nput mage or the focus regon. Fg. 5 shows those mages used n our experment. Snce the executon tme of the exhaustve mage matchng wth AD algorthm depends only on the szes of those mages and s nsenstve to ther contents, a same nput mage (Fg. 5a) can be used n every nvocaton of the mage matchng task wthout affectng the task workload, whch s the man concern of FCS/nORB. Smlarly, a same focus regon (Fg. 5b) can also be used. The swtch between the full nput mage and the focus regon s forced to smulate the target capture and loss. In the sequence descrbed n above scenaro, we frst use the full nput mage to search for the target template. At a certan tme the target s found and we then start to search the focus regon mage for a whle. Fnally we change back to the nput mage by assumng the target s lost from the focus regon. Although the mages used n our experments are smplfed compared to real world scenaros, they are suffcent for the purpose of causng realstc varatons n the task executon tme. Table 3 FCS confguraton n all experments FC-U FC-M FC-UM Reference U s = 70% M s = 1.5% M s = 1.5% U s = 75% K u, K M K u = 0.185, K M = 0.414 Control perod 4 s (10 s n Experment III) 3.1.3. FCS confguraton The confguraton parameters for FCS are shown n Table 3. To demonstrate the robustness of feedback control, the same confguraton was used n all experments even though they were performed on dfferent platforms and tested wth dfferent workloads and executon tmes. The controller parameters K u and K M were computed usng control theory based on a trade-off between stablty range and system settlng tme [24]. The utlzaton reference of FC-U s chosen to be 70%, slghtly lower than the RMS schedulable utlzaton bound for 12 tasks: 12(2 1/12 1) = 71%. FC- UM had a hgher utlzaton reference (75%) because t uses mss rato control as we dscussed n Secton 3. The control perod used n Experments I and II s 4 s. Snce n the real mage matchng workload task 1 and task 3 both have 5 s as ther maxmum perod, we set the control perod n Experment III to 10 s to decrease the samplng jtter caused by rate tunng. As a baselne, we also ran these experments under open-loop schedulng (RMS) by turnng off the feedback loop. For smplcty, the open-loop baselne s called OPEN n the rest of the paper. 3.2. Experment I: performance portablty In Experment I, the executon tme of each task on Server A remaned approxmately twce ts estmated value throughout each run. The purpose of ths set of experments was to evaluate the performance of the FCS algorthms and OPEN when task executon tmes vary sgnfcantly from estmated values, ether due to the dfference between a new deployment platform and the orgnal platform on whch the tasks were profled, or to sgnfcant naccuracy n task proflng. Our frst experment emulates common engneerng practce based on open loop schedulng. We frst tuned task rates based on the estmated executon tmes so that the total estmated utlzaton was 70%. However, when we ran the tasks at the rates accordng to the predcted rates, the server locked up. Ths s not surprsng: snce the estmated executon tmes were naccurate, the actual total requested utlzaton by all norb threads reached approxmately 140%. Ths caused the Lnux kernel to freeze because all norb threads were run (wth the root prvlege) at realtme schedulng prortes that are hgher than kernel prortes on Lnux. When the CPU utlzaton requested by norb threads reached 100%, no kernel actvtes were able to execute. To avod

420 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 0 25 50 75 100 125 150 175 200 Tme (4 sec) B(k) 0 50 100 150 200 250 300 Tme (4 sec) Fg. 8. A typcal run of FC-U on Server B. B(k) Fg. 6. A typcal run of FC-U on server A. ths problem usng common real-tme engneerng technques, all the tasks would need to be re-profled for each platform on whch the applcaton s deployed. Hence, the open-loop approach can cost developers sgnfcant tme to tune the workload to acheve the same performance on dfferent platforms. Ths lack of performance portablty s an especally serous problem when there s a large number of potental platforms (e.g., n a product lne) or f a potental platform s unknown at system development tme. We now examne the expermental results for the FCS algorthms themselves. As an example, Fg. 6 llustrates the sampled utlzaton, the mss rato, and the total estmated utlzaton B(k) computed by the controller n a typcal run under FC-U. All tasks started from ther lowest rates. The feedback control loop rapdly ncreased by rasng task rates (proportonal to B(k)). At the 5th samplng pont, the reached 67.7% and settled n a steady-state around 70%. Ths result shows that FC-U can selftune task rates to acheve the specfed CPU utlzaton even when task executon tmes were sgnfcantly dfferent from estmated values. The results are consstent wth the control analyss presented n [24]. The performance results for FC-U, FC-M, and FC-UM on Server A are summarzed n Fg. 7a c. The performance metrcs we used ncluded the mss rato and utlzaton n steady-state, and the settlng tme. The steady-state mss rato s defned as the average mss rato n a steady-state. The steady-state utlzaton s smlarly defned as the average utlzaton n a steady-state. Both metrcs measure the performance of a system after ts adaptaton process settles down to a steady-state. Settlng tme represents the tme t takes the system to settle down to a steady-state. The settlng tme can also be vewed as the duraton of the self-tunng perod after an applcaton s ported to a new platform. It s usually dffcult to determne the precse settlng tme on a nosy, real system. As an approxmaton, we consdered that FC-U and FC-M entered a steady-state at the frst samplng nstant when reached 0.99U s, and FC-M entered a steady-state at the frst samplng nstant when reached 0.99U s n the last control perod of the experment. Each data pont n Fg. 7a c s the mean of three repeated runs, and each run took 800 s. The standard devatons n mss rato, utlzaton, and settlng tme are below 0.01%, 0.03%, and 6.11 s (.e., a 1.53 control perod), respectvely. From Fg. 7a, we can see that both FC-U and FC-UM caused no deadlne msses n steady-states. FC-M s steady-state mss rato s 1.49%, compared to the mss rato reference of 1.5%. At the same tme, the steady-state utlzatons of FC-U and FC-UM are respectvely 70.01% and 74.97%, compared to respectve utlzaton references of 7% and 75%. The result for FC-UM occurred because the utlzaton control domnated n steady-state due to the fact that ts steady-state utlzaton s lower than the mss rato control. In contrast, FC-M acheved a hgher utlzaton (98.93%) n the steady-state at the cost of a slghtly hgher mss rato. As shown n Fg. 7c, FC-M and FC-UM both had sgnfcantly longer settlng tmes than FC-U due to the saturaton of mss rato control n underutlzaton. Ths means that FC-M and FC-UM need more self-tunng tme before they can reach steady-states. Note that the settlng tmes of FC-M and FC-UM are related to the ntal task rates. In our experments, all tasks started from ther lowest possble rates n the begnnng of the self-tunng phase. The settlng tmes can be reduced by settng the ntal task rates closer to the desred rates. For example, we may choose the ntal rates to be the same as the desred rates on the slowest platform n a product lne. Mss rato 0.02 0.015 0.01 5 0 FC-U FC-UM FC-M CPU utlzaton 1 0.8 0.6 0.4 0.2 0 FC-U FC-UM FC-M (a) Average steady-state mss rato (b) Average steady-state CPU utlzaton 75 Settlng tme (4 sec) 50 25 0 FC-U FC-UM FC-M (c) Average settlng tme Fg. 7. Performance results of FCS algorthms on server A n Experment II.

X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 421 To further evaluate the performance portablty of FCS/nORB, we re-ran the same experments on Server B. Typcal runs of FC- U, FC-UM, and FC-M are shown n Fgs. 8 10, respectvely. Each run takes 1200 s. As was the case on Server A, all the algorthms successfully enforced ther utlzaton and/or mss rato references n steady-state. The dfference s that all tasks ran at a hgher rate (proportonally to B(k)) on Server B than Server A because Server B s faster than Server A. In addton, all algorthms had longer settlng tmes on Server B than Server A. Ths s consstent wth our control analyses n [24]. In summary, Experment I demonstrated that FCS/nORB can provde a desred utlzaton or mss rato even when (1) applcatons were ported to dfferent platforms and (2) task executon tmes were sgnfcantly dfferent from ther estmatons. Therefore, FCS/nORB represents a way to perform automatc performance tunng on a new platform. In addton, we note that a combnaton of FCS and open-loop schedulng can be used to acheve both self-tunng and run-tme effcency for applcatons wth steady workloads. When an applcaton s ported to a new platform, t can be scheduled ntally usng the FCS algorthm to converge to a steady-state wth desred performance. Then the feedback control loop can be turned off and the applcatons can contnue to run at the correct rates under open-loop schedulng. 3.3. Experment II: varyng realstc workload 1.10 0 50 100 150 200 250 300 350 400 Tme (10 sec) Fg. 11. A typcal run of OPEN under realstc workload. In Experment II, we examned FCS/nORB s performance wth the realstc workload n whch the executon tme of the target locaton task vary dynamcally. Fg. 11 shows a typcal run of OPEN. In the begnnng of the run, the target locaton task had a long executon tme whle t searched the whole nput mages for the nterested object. Consequently, the CPU utlzaton was close to 95% and a number of task nvocatons mssed ther deadlnes. At around 160th control perod, the target was assumed to have been found, so the focus regon was shrunk to locate the target. CPU utlzaton dropped sgnfcantly as we can observe n Fg. 11. Ths drop swtched the system from an overloaded to an underutlzed status. We contnued by assumng the target escaped from the focus regon at around the 265th control perod, so the executon tme of the target locaton task then returned to ts orgnal level. At that pont utlzaton agan returned to an overload condton. Hence n ths scenaro, the OPEN system just swtched back and forth between overload wth deadlne msses, and underutlzaton wth unnecessarly low task rates, nether of whch leads to satsfactory performance. Fgs. 12 and 13 show that both FC-U and FC-UM mantaned specfed utlzaton levels n ther steady-states, whch was over most of the entre run. The performance of FC-U s llustrated n Fg. 12. The CPU utlzaton was decreased to the set pont (70%) at the 15th perod, so deadlne msses were avoded. At around 160th perod, when the system found the object, FC-U drove the utlzaton back up to the set pont by ncreasng the rates of all current tasks to utlze the CPU better. The faster rates also mproved the system utlty. Partcularly for our mage matchng task, more frequent nvocaton of the mage matchng task mproves trackng precson whle reducng the chance that the tracked object may escape from the wndow mage. At the 165th control perod, when the target dd escape from the focus regon, the full mage was searched agan to relocate the target. FC-U only had a very transent spke of deadlne msses, whch s hghly preferable compared wth OPEN. The performance of FC-UM shown n Fg. 13 had smlar results to FC-U. The only dfference s that the settlng tme was longer when the system recovered from underutlzed status, for the same reasons explaned n prevous secton. 0 50 100 150 200 250 300 Tme (4 sec) Fg. 9. A typcal run of FC-UM on Server B. B(k) 1.10 0 50 100 150 200 250 300 350 400 Tme (10 sec) B(k) Fg. 12. A typcal run of FC-U under varyng workload. 0 50 100 150 200 250 300 Tme (4 sec) B(k) 1.10 0 50 100 150 200 250 300 350 400 Tme (10 sec) B(k) Fg. 10. A typcal run of FC-M on Server B. Fg. 13. A typcal run of FC-UM under varyng workload.

422 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 Interestngly, FC-M caused the system to lock up when the executon tmes ncreased. Ths s because FC-M acheved a hgh utlzaton (more than 90%) before the executon tme ncreased at around 160th control perod. The utlzaton then ncreased to 100% due to the ncrease n executon tmes, and the system agan locked up due to kernel starvaton. In contrast, prevous smulaton results [24] showed that FC-M could handle such varyng workload because the mpact of CPU over-utlzaton by the mddleware on kernel actvtes was not modeled n the smulator, whch was desgn to smulate a scheduler n the OS kernel. In general, FCS/nORB cannot handle varyng workload that even transently ncreases the utlzaton to 100% due to the starvaton of the kernel under such condtons. Ths result shows a lmtaton of mddleware mplementatons on top of common general purpose operatng systems (e.g., Lnux, Wndows, and Solars) n whch real-tme schedulng prortes are hgher than kernel prortes. On such platforms, the range of varaton n utlzaton that the FCS algorthms can handle s lmted by ts steady-state utlzaton before the varaton occurs. For example, wth a utlzaton reference of U s, FC-U can only handle a utlzaton ncrease of no more than (1 U s ) n order to provde robust utlzaton guarantees. Therefore, the utlzaton reference of FC-U and FC-UM should consder ths safety margn n the face of varyng workload. Snce FC-M usually acheves a hgh utlzaton and, more mportantly, does not have control over ts safety margn, a mddleware mplementaton of FC-M s less approprate for tme varyng workloads. Experments II demonstrated that FC-U and FC-UM can provde robust performance guarantees, even when task executon tmes vary (wthn the aforementoned safety margn) at run-tme. 3.4. Experment III: overhead measurement The feedback control loop for each FCS algorthm ntroduces overhead. Ths overhead s caused by several factors ncludng the tmer assocated wth FCS, the cost of utlzaton and mss rato montorng, the control computaton n the controller, and the rate calculaton and communcaton overhead n the rate assgner. FCS s a vable mddleware servce only f the overhead t ntroduces s suffcently low. 3.4.1. Coarse-graned overhead measurement To quantfy the overhead mposed by the FCS algorthms, we compared the average CPU utlzaton under dfferent schedulng algorthms when the same workload s appled to the system runnng on Server A. To lmt the overhead caused by utlzaton montorng for OPEN and FC-M, average CPU utlzatons were measured by settng the control perod of the utlzaton montor to the duraton of the entre run,.e., the utlzaton montor s only nvoked twce for each run wth FC-M and OPEN once at the begnnng of the run, and once at the end of the run. The average CPU utlzaton of FC-U and FC-UM was measured by averagng the utlzaton of each control perod, snce they need to execute the utlzaton montor perodcally. To keep the applcaton workload constant, we dsabled the rate modulator on the clents so that all tasks always ran at constant rates. The results of the overhead measurements are summarzed n Table 4. The frst row shows the mean of the average utlzatons n 8 repeated runs, along wth ts 90% confdence nterval. Each run lasted for 800 s, a total of 200 control perods. The second row shows the overhead of each FCS algorthm n terms of CPU utlzaton, whch s computed by subtractng OPEN s utlzaton from each FCS algorthm s utlzaton. The 90% confdence nterval of the most effcent algorthm, FC- U, actually overlapped wth that of OPEN, whch meant that FC-U showed no statstcally sgnfcant overhead based on our measurement. FC-M and FC-UM, however, showed statstcally sgnfcant Table 4 Results of coarse-graned overhead measurement OPEN FC-U FC-M FC-UM Utl (%) 74.15 ± 74.55 ± 0.42 74.70 ± 75.05 ± 0.16 Overhead (%) 0.54 overhead compared to OPEN. Over a 4 s control perod, all three FCS algorthms ntroduced overhead of less than 1% of the total CPU utlzaton. FC-U ntroduced the least overhead among all FCS algorthms, ndcatng that the utlzaton montor was more effcent than the mss rato montor. Whle the utlzaton montor only needs to read the /proc/stat fle once every control perod, the mss rato montor requres tme stampng every method nvocaton twce. FC-UM s overhead s slghtly less than the sum of the overheads from FC-M and FC-U. Ths s because, whle FC-UM ran both montors, t only execute the controller and actuator once per nvocaton. 3.4.2. Fne-graned overhead measurement Although the above overhead measurement shows satsfactory results based on utlzaton comparson between OPEN and FCS algorthms, we notced two lmtatons of the above measurement approach. The frst one s that the Lnux system fle /proc/stat records the number of jffes. Snce each jffy s 10 ms (1/100 of a second), the granularty of above measurement s too coarse for precse measurements on the overhead. The second problem s that CPU utlzaton may suffer nterference from the operatng system tself even though we mnmzed the number of system processes. To measure overhead more accurately, we adopted a tme stampng approach. Frstly, we dfferentated all FCS related code from the orgnal norb code. Then two tme stamps were taken at the startng pont and fnshng pont of each segment of FCS code to get the executon tme of FCS. Fortunately, snce most FCS code s wthn feedback lane whch s runnng wth hghest Lnux real-tme prorty, the code segment between two tmestamps wll not be preempted durng ts executon. Hence, the tmestamped result accurately reflects the real executon overhead. To acheve fne-graned measurements, we needed an accurate tme stampng functon. The commonly used gettmeofday system call can not be used here snce ths functon s also based on a 10 ms scale. Instead we adopted a nanosecond scale tme measurng functon called gethrtme. Ths functon uses an OS-specfc hgh-resoluton tmer, whch can be found on Solars, AIX, Wn32/Pentum, Lnux/Pentum and VxWorks, to return the number of clock cycles snce the CPU was powered up or reset. The gethrtme functon has a low overhead and s based on a 64 bt clock cycle counter. Wth the clock counter number dvded by the CPU speed, we can get reasonably precse and accurate tme measurements. Snce gethrtme s supported on Pentum processor, we performed our fne-graned overhead measurements on Server B, a Dell Pentum4 PC. In Table 5 we lst all FCS related operatons and ther overheads for the three FCS algorthms respectvely. All results n that table are averaged values of 10 runs and each run s result s an average over 300 contnuous control perods. Operatons 1 4, respectvely gve the overhead of the utlzaton montor, the mss rato montor, the controller, and the rate assgner, all of whch ran n a feedback lane at hghest prorty. Operaton 5 ran n the remote method nvocaton lane and was used to tme stamp each remote method nvocaton from the clent sde twce to check whether t meets the deadlne as Secton 2.2 explans. The overheads of operatons 1 4 are relatvely fxed for each control perod, whle the overhead of operaton 5 depends on how many nvocatons come from clent sde n a gven control perod. The measured overhead for one sn-

X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 423 Table 5 Results of dfferentated fne-graned overhead measurement # Name Descrpton FC-U (ls) FC-M (ls) FC-UM (ls) 1 Utlzaton montor /proc/stat system fle readng 16 N/A 263.22 2 Mss rato montor Deadlne mss rato readng N/A 181.29 3 Controller Control analyss 40.64 49.84 43.27 4 Rate assgner Calculatng new rate; transmttng new rate to clent sde 659.90 633.73 637.74 5 Tme stamp Tme stampng each remote method nvocaton twce to check deadlne N/A 0.1246 0.1246 Total (Assumng 1000 remote method nvocatons n each control perod) 861.44 989.46 1068.82 gle gethrtme call s 0.0623us. Wth n nvocatons n one control perod, the overhead for tme stampng s 0.1246n us. In the total value row, we assume a common applcaton model whch has 10 tasks runnng at a rate of 100 nvocatons per second. If the control perod s 1 s, we get 1000 remote method nvocatons per control perod. From Table 5 t s easy to see that FC-U has the lowest overhead and FC-UM has the hghest overhead. That observaton s consstent wth our coarse-graned overhead measurements. It s also nterestng to fnd that fne-graned overhead result for FCS algorthms s actually much less than the result of the coarse-graned measurement. The reason s coarse-graned measurement s based on 10 ms measurement accuracy so t s unable to gauge ths overhead precsely. Fg. 14 llustrates the overheads for the montor, controller and rate assgner n the three FCS algorthms whle the overhead of tme stampng s not ncluded. Rate assgner has the domnant overhead because t nvolves relatvely more complcated nternal data structure access, modfcaton and socket handlng whle there are just several lnes of code for montor and controller. Overall, the server overhead of all FCS algorthms n our experments s around 1 ms per control perod, whch s clearly acceptable n a wde range of real-tme and embedded applcatons. 3.4.3. Memory footprnt measurement Besdes executon tme, memory overhead s also a sgnfcant factor for overall system performance. For embedded systems, however, code sze s a major part of the memory footprnt because all code of a system s typcally loaded nto the memory before the system starts to execute. Hence, t s useful to measure the code sze ncrease after we plugged n FCS related code. The code sze of FCS/nORB s 680 KB for clent and 801 KB for server, whle the sze of norb s 602 KB for clent and 723 KB for server. We see that addng FCS only resulted n an ncrease of 78 KB on both clent and server. The rato of ncrease s only 12% and 10% for the clent sde and server sde, respectvely. Ths mnor ncrease s acceptable consderng the system performance mprovements that were seen n the prevous experments. We note that the combned statc footprnt on each gven endsystem, of both FCS/nORB and the clent or server applcaton, s well below the statc footprnt of a full-featured ORB such as TAO alone (detaled footprnt results of norb and TAO are avalable n [33]). Overhead tme (mcro seconds) 1000 800 600 400 200 0 FC_U FC_M FC_UM FCS algorthms Fg. 14. Detaled overhead measurement. Montor Controller Assgner 4. Related work Control theoretc approaches have been appled to varous computng and networkng systems. A survey of feedback performance control for software servces s presented n [3]. Recent research on applyng control theory to real-tme schedulng s drectly related to ths paper. For example, Steere et al. developed a feedback-based CPU scheduler [35] that coordnated allocaton of the CPU to consumer and suppler threads, to guarantee the fll level of buffers. Goel et al. developed feedback-based schedulers [11] that guarantee desred progress rates for real-tme applcatons. Aben et al., presented analyss of a reservaton-based feedback scheduler n [4]. In our prevous work [24,25,38], feedback control algorthms were developed to provde deadlne mss rato and utlzaton guarantees for real-tme applcatons wth unknown task executon tmes. Feedback control real-tme schedulng has also been extended to handle dstrbuted systems [26,34]. For systems requrng dscrete control adaptaton strateges, hybrd control theory has been adopted to control state transtons among dfferent system confguratons [1,16]. Feedback control has also been successfully appled to power control [18] and dgtal control applcatons [7]. A key dfference between the work presented n ths paper and the related work s that we descrbe the desgn and evaluaton of a FCS servce at the ORB mddleware layer, whle the related work s based ether on smulatons [1,16,24 26,34,38] or kernel level mplementaton [4,7,35]. Adaptve mddleware s emergng as a core buldng block for DRE systems. For example, TAO [31], dynamctao [15], ZEN [14], and norb [33] are adaptve mddleware frameworks that can (re)confgure varous propertes of Object Request Broker (ORB) mddleware at desgn- and run-tme. Hgher-level adaptve resource management frameworks, such as QuO [41], Kokyu [10] and RTARM [13], leverage lower-level mechansms provded by ORB mddleware to (re)confgure schedulng, dspatchng, and other QoS mechansms n hgher-level mddleware. ORB servces such as the TAO Real-Tme Event Servce [11] and TAO Schedulng Servce [10] offer hgh level servces for managng functonal and real-tme propertes of nteractons between applcaton components. The dfference between the work presented n ths paper and earler work at adaptve ORB mddleware s that our work ntegrated a unfed control theoretc framework wth a reduced-feature-set ORB (norb). As a result, our work provdes adaptve mddleware servce, wth real-tme performance guaranteed by control theores, to networked embedded systems wth storage space and power lmtatons. Aglos [19] was an earler effort on control-based mddleware framework for QoS adaptaton n dstrbuted multmeda applcatons. Our work provdes a general framework whch s applcable to varous real-tme applcatons, whereas Aglos only supports adaptaton strateges (e.g., mage operatons) specfc to multmeda applcatons (e.g., vsual trackng). Another project that s closely related to FCS/nORB s Control- Ware [40], whch s also an ncarnaton of software performance control at the mddleware layer. The dfference s that ControlWare embodes adaptaton mechansms (such as server process allocaton

424 X. Wang et al. / Mcroprocessors and Mcrosystems 32 (2008) 413 424 n the Apache server) that are talored for Qualty of Servce provsonng on Internet servers, whle FCS/nORB ntegrates Feedback Control real-tme Schedulng wth method nvocaton mechansms for real-tme embedded systems. In our prevous work, we have developed FC-ORB [36], feedback controlled mddleware for dstrbuted real-tme systems. FC-ORB only controls processor utlzaton and s desgned to handle end-to-end tasks whle FCS/nORB controls both processor utlzaton and deadlne mss rato and focuses on the server sde. WSOA [8] gave a large-scale demonstraton of adaptve resource management at multple archtectural levels n a realstc dstrbuted avoncs msson computng envronment. The WSOA mage transmsson applcaton s n essence a networked ad hoc control system, wth adaptaton of mage tle compresson to meet download deadlnes. Based on the WSOA applcaton, a real-tme system computng model and theoretcal controller has been developed n [37]. Smlar to the WSOA project, n ths paper, we also seek to complement exstng mddleware projects for DRE systems, and ncrease the capabltes offered by DRE mddleware as a whole. 5. Conclusons In summary, we have desgned and mplemented a Feedback Control real-tme schedulng servce atop ORB mddleware for dstrbuted real-tme embedded systems. Performance evaluaton on a physcal testbed has shown that (1) FCS/nORB can guarantee specfed mss rato and CPU utlzaton levels even when task executon tmes devate sgnfcantly from ther estmated values or change sgnfcantly at run-tme; (2) FCS/nORB can provde smlar performance guarantees on platforms wth dfferent processng capabltes; and (3) the mddleware layer nstantaton of performance control loops only ntroduces a small amount of processng overhead on the clent and server. These results demonstrate that a combnaton of FCS and ORB mddleware s a promsng approach to acheve robust real-tme performance guarantees and performance portablty for DRE applcatons. Acknowledgement Ths work s funded, n part, by the US Natonal Scence Foundaton under Grant CNS-0720663 and by DARPA under Grant NBCHC030140. References [1] S. Abdelwahed, S. Neema, J. Loyall, R. Shapro, A hybrd control desgn for QoS management, n: IEEE Real-Tme Systems Symposum (RTSS), December 2003. [2] T.F. Abdelzaher, E.M. Atkns, K.G. Shn, QoS negotaton n real-tme systems and ts applcaton to automated flght control, IEEE Transactons on Computers 49 (11) (2000). [3] T.F. Abdelzaher, J.A. Stankovc, C. Lu, R. Zhang, Y. Lu, Feedback performance control n software servces, IEEE Control Systems 23 (3) (2003). [4] L. Aben, L. Palopol, G. Lpar, J. Walpole, Analyss of a reservaton-based feedback scheduler, n: IEEE Real-Tme Systems Symposum (RTSS), December 2002. [5] S. Brandt, G. Nutt, A dynamc qualty of servce mddleware agent for medatng applcaton resource usage, n: IEEE Real-Tme Systems Symposum (RTSS), December 1998. [6] Gorgo Buttazzo, Luca Aben, Adaptve workload management through elastc schedulng, Real-Tme Systems 23 (1 2) (2002). [7] A. Cervn, J. Eker, B. Bernhardsson, K.-E. Årzén, Feedback-feedforward schedulng of LQG-control tasks, Real-tme Systems Journal 23 (1 2) (2002). [8] D. Corman, WSOA weapon systems open archtecture demonstraton usng emergng open system archtecture standards to enable nnovatve technques for tme crtcal target prosecuton, n: IEEE/AIAA Dgtal Avoncs Systems Conference (DASC), October 2001. [9] J. Eker, Flexble embedded control systems-desgn and mplementaton. PhDthess, Lund Insttute of Technology, December 1999. [10] C.D. Gll, D.L. Levne, D.C. Schmdt, The desgn and performance of a real-tme CORBA schedulng servce, Real-Tme Systems Journal 20 (2) (2001). [11] A. Goel, J. Walpole, M. Shor, Real-rate schedulng, n IEEE Real-Tme and Embedded Technology and Applcatons Symposum (RTAS), May 2004. [12] T.H. Harrson, D.L. Levne, D.C. Schmdt, The desgn and performance of a realtme CORBA event servce, n ACM SIGPLAN Internatonal Conference on Object-Orented Programmng, Systems, Languages, and Applcatons (OOPSLA), October 1997. [13] J. Huang, Y. Wang, F. Cao, On developng dstrbuted mddleware servces for QoS- and crtcalty-based resource negotaton and adaptaton, Real-Tme Systems Journal, Specal Issue on Operatng Systems and Servces 16 (2) (1999). [14] R. Klefstad, D.C. Schmdt, C. O Ryan, Towards hghly confgurable real-tme object request brokers, n: the IEEE/IFIP Internatonal Symposum on Object- Orented Real-tme Dstrbuted Computng (ISORC), March 2002. [15] F. Kon, F. Costa, G. Blar, R. Campbell, The case for reflectve mddleware, Communcatons of the ACM 45 (6) (2002). [16] X. Koutsoukos, R. Tekumalla, B. Natarajan, C. Lu, Hybrd supervsory control of real-tme systems, n: IEEE Real-Tme and Embedded Technology and Applcatons Symposum (RTAS), March 2005. [17] C. Lee, J. Lehoczky, D. Seworek, R. Rajkumar, J. Hansen, A scalable soluton to the mult-resource QoS problem, n: IEEE Real-Tme Systems Symposum (RTSS), December 1999. [18] C. Lefurgy, X. Wang, M. Ware, Server-level power control, n: IEEE Internatonal Conference on Autonomc Computng (ICAC), June 2007. [19] B. L, K. Nahrstedt, A control-based mddleware framework for qualty of servce adaptatons, IEEE Journal on Selected Areas n Communcatons, Specal Issue on Servce Enablng Platforms 17 (9) (1999). [20] C.L. Lu, J.W. Layland, Schedulng algorthms for multprogrammng n a hard real-tme envronment, Journal of ACM 20 (1) (1973). [21] J.W.S. Lu, Real-Tme Systems, Prentce Hall, 2000. [22] J.W.S. Lu et al., Algorthms for schedulng mprecse computatons, IEEE Computer 24 (5) (1991). [23] J. Loyall et al., Comparng and contrastng adaptve mddleware support n wde-area and embedded dstrbuted object applcatons, n: IEEE Internatonal Conference on Dstrbuted Computng Systems, Aprl 2001. [24] C. Lu, J.A. Stankovc, G. Tao, S.H. Son, Feedback control real-tme schedulng: framework, modelng, and algorthms, Real-Tme Systems Journal, Specal Issue on Control-theoretcal Approaches to Real-Tme Computng 23 (1 2) (2002). [25] C. Lu, X. Wang, C.D. Gll, Feedback control real-tme schedulng n ORB mddleware, n: IEEE Real-Tme and Embedded Technology and Applcatons Symposum (RTAS), May 2003. [26] C. Lu, X. Wang, X.K. Koutsoukos, End-to-end utlzaton control n dstrbuted real-tme systems, n: Internatonal Conference on Dstrbuted Computng Systems (ICDCS), March 2004. [27] W.K. Pratt, Dgtal Image Processng, John Wley & Sons, New York, 1978. [28] Publc Netperf Homepage, <http://www.netperf.org>. [29] D. Rosu, K. Schwan, S. Yalamanchl, R. Jha, On adaptve resource allocaton for complex real-tme applcatons, n: IEEE Real-Tme Systems Symposum, December 1997. [30] D.C. Schmdt, The ADAPTIVE communcaton envronment: an object-orented network programmng toolkt for developng communcaton software, n: 12th Annual Sun Users Group Conference, December 1993. [31] D.C. Schmdt et al., TAO: A pattern-orented object request broker for dstrbuted real-tme and embedded systems, IEEE Dstrbuted Systems Onlne, 3(2), February 2002. http://dsonlne.computer.org/mddleware. [32] D. Seto, J.P. Lehoczky, L. Sha, K.G. Shn, On task schedulablty n real-tme control systems, n: IEEE Real-Tme Systems Symposum (RTSS), December 1996. [33] V. Subramonan, G. Xng, C.D. Gll, C. Lu, R. Cytron, Mddleware specalzaton for memory-constraned networked embedded systems, n: IEEE Real-Tme and Embedded Technology and Applcatons Symposum (RTAS), May 2004. [34] J.A. Stankovc et al., Feedback control real-tme schedulng n dstrbuted realtme systems, n: IEEE Real-Tme Systems Symposum (RTSS), December 2001. [35] D.C. Steere et al., A feedback-drven proporton allocator for real-rate schedulng, n: Symposum on Operatng Systems Desgn and Implementaton (OSDI), February 1999. [36] X. Wang, Y. Chen, C. Lu, X. Koutsoukos, FC-ORB: a robust dstrbuted real-tme embedded mddleware wth end-to-end utlzaton control, Elsever Journal of Systems and Software 80 (7) (2007). [37] X. Wang, M. Chen, H. Huang, V. Subramonan, C. Lu, C. Gll, CAMRIT: controlbased adaptve mddleware for real-tme mage transmsson, IEEE Transactons on Parallel and Dstrbuted Systems 19 (6) (2008). [38] X. Wang, D. Ja, C. Lu, X. Koutsoukos, DEUCON: decentralzed end-to-end utlzaton control for dstrbuted real-tme systems, IEEE Transactons on Parallel and Dstrbuted Systems 18 (7) (2007). [39] L.R. Welch, B. Shraz, B. Ravndran, Adaptve resource management for scalable, dependable real-tme systems: mddleware servces and applcatons to shpboard computng systems, n: IEEE Real-tme Technology and Applcatons Symposum (RTAS), June 1998. [40] R. Zhang, C. Lu, T.F. Abdelzaher, J.A. Stankovc, ControlWare: a mddleware archtecture for feedback control of software performance, n: Internatonal Conference on Dstrbuted Computng Systems (ICDCS), July 2002. [41] J.A. Znky, D.E. Bakken, R. Schantz, Archtectural support for qualty of servce for CORBA objects, Theory and Practce of Object Systems 3 (1) (1997).