Dispatch-and-Search: Dynamic Multi-Ferry Control in Partitioned Mobile Networks

Dispach-and-Search: Dynamic Muli-Ferry Conrol in Pariioned Mobile Newors ing He, Ananhram Swami, and Kang-Won Lee IBM Research, Hawhorne, NY 1532 USA {he,angwon}@us.ibm.com Army Research Laboraory, Adelphi, MD 2783 USA aswami@arl.army.mil ABSRAC We consider he problem of disseminaing daa from a base saion o a sparse, pariioned mobile newor by conrollable daa ferries wih limied ferry-node and ferry-ferry communicaion ranges. Exising soluions o daa ferry conrol mosly assume he nodes o be saionary, which reduces he problem o designing fixed ferry roues. In he more challenging scenario of mobile newors, exising soluions have focused on single-ferry conrol and lef ou an imporan issue of ferry cooperaion in he presence of muliple ferries. In his paper, we joinly address he issues of ferry navigaion and cooperaion using he approach of sochasic conrol. Under he assumpion ha ferries can communicae wihin each pariion, we propose a hierarchical conrol sysem called Dispach-and-Search (DAS), consising of a global conroller ha dispaches ferries o individual pariions and local conrollers ha coordinae he search for nodes wihin each pariion. Formulaing he global and he local conrol as Parially Observable Marov Decision Processes (POMDPs), we develop efficien conrol policies o opimize he (discouned) oal hroughpu, which significanly improve he performance of heir predeermined counerpars in cases of limied prior nowledge. Caegories and Subjec Descripors C.2.1 [Compuer-Communicaion Newors]: Newor Archiecure and Design sore and forward newors; G.4 [Mahemaical Sofware]: algorihm design and analysis; I.2.8 [Arificial Inelligence]: Problem Solving, Conrol Mehods, and Search conrol heory; I.2.9 [Arificial Inelligence]: Roboics auonomous vehicles Research was sponsored by he U.S. Army Research Laboraory and he U.K. Minisry of Defence under Agreemen Number W911NF-6-3-1. he views and conclusions are hose of he auhors and do no represen he U.S. Army Research Laboraory, he U.S. Governmen, he U.K. Minisry of Defence or he U.K. Governmen. he U.S. and U.K. Governmens are auhorized o reproduce and disribue reprins for Governmen purposes nowihsanding copyrigh noaion. Permission o mae digial or hard copies of all or par of his wor for personal or classroom use is graned wihou fee provided ha copies are no made or disribued for profi or commercial advanage and ha copies bear his noice and he full ciaion on he firs page. o copy oherwise, o republish, o pos on servers or o redisribue o liss, requires prior specific permission and/or a fee. MobiHoc 11, May 16 2, 211, Paris, France. Copyrigh 211 ACM 978-1-453-722-2/11/5...$1.. General erms Algorihms, Design, Performance Keywords Daa ferry conrol, Parially Observable Marov Decision Processes, Myopic conrol policies 1. INRODUCION he developmen of roboic echnology has led o improvemens in many applicaions, a recen one being he use of conrollable mobile nodes o assis communicaions in sparse newors. Sparse newor opologies are rouinely encounered in a broad variey of applicaions, e.g., sensor/acuaor newors, search-and-rescue operaions, underwaer newors, spacecraf newors ( inerplaneary communicaions ), vehicular ad hoc newors (VANEs), and miliary mobile ad hoc newors (MANEs). he sparse newors discussed above pose significan challenges o radiional communicaion schemes since exising in-newor soluions such as rouing and delay-oleran communicaions are only designed for (inermienly) conneced newors and will fail if he newor is permanenly pariioned. In such pariioned newors, he only remedy is o loo for exernal help, and mobile helper nodes called daa ferries, usually mouned on conrollable plaforms such as UAVs, sand ou as a preferred soluion in many scenarios due o heir agiliy in conras o fixed infrasrucures. he problem, hough, is ha proper conrol has o be in place o leverage hese daa ferries effecively. Previous soluions o daa ferry conrol have mosly focused on designing fixed ferry roues [3, 5, 12, 13] for saic newors. Alhough he approach sill applies o mobile newors [9], he performance is generally subopimal as fixed roues can only leverage long-erm saisics of node movemens. o fully uilize he ferries capabiliy, we have shown in [2] ha each ferry needs o dynamically adjus is movemens based on run-ime observaions. As in [2], we assume a ferry can only observe nodes wihin a cerain disance defined by ferry-node communicaion range, referred o as parial observaions 1. Alhough i shows promising performance improvemens over fixed roues, he soluion in [2] is limied by (i) only considering a single ferry and (ii) only focusing on conac rae opimizaion. he availabiliy of muliple ferries raises a new challenge because one 1 his is in conras o assuming complee observaions of nodes in he enire newor as in [1,11,14].

ferry s opimal movemens will depend on he movemens of he oher ferries, which induces a need for he ferries o cooperae. While his is no an issue under unlimied ferryferry communicaions, global cooperaion can be expensive, especially for newors wih scaered pariions. Furhermore, alhough conac raes are performance indicaors, he ulimae goal of he ferries should be o opimize he end-oend communicaion performance perceived by nodes, which also depends on oher facors such as raffic demands. hese new challenges call for a new soluion ha can efficienly uilize muliple ferries o opimize he end-o-end performance wihou excessive cooperaion overhead. 1.1 Relaed Wor Designaed communicaion nodes, ypically called daa ferries or daa mules, have been considered as an effecive soluion o suppor communicaions in sparse newors. Daa ferries physically carry daa eiher beween as (i.e., nonferry) nodes [3, 11 14] or beween nodes and a base saion [1,4 6]. he main issue in using such ferries is how o conrol heir mobiliy, where exising soluions can be divided ino single-ferry conrol [1, 2, 4, 9, 14] and muli-ferry conrol [3,5,6,12,13]. An in-beween case was sudied in [11] where nodes can swich beween he roles of ferry and nonferry. Mos exising wor focuses on ferry roue design under he assumpion ha eiher he as nodes are saionary [3 5, 12, 13], or heir locaions are always nown by he ferries if hey are mobile (i.e., complee observaions) [11,14]. A relaed problem of saic bu randomly disribued nodes has been sudied: using predeermined roues in [6] and dynamic roue adapaion based on complee observaions in [1] 2. he problem of sochasic node movemens and parial observabiliy was firs sudied in [9] using predeermined roues and laer improved in [2] using a conrol policy ha navigaes he ferry dynamically. 1.2 Our Approach and Resuls We consider conrolling muliple ferries in pariioned mobile newors o opimize he oal hroughpu. We presen our soluion in he scenario of daa disseminaion, alhough our approach exends naurally o he cases of daa harvesing and peer-o-peer communicaions (see discussions in Secion 7). Our specific conribuions are: Flexible conrol framewor: o avoid long-range ferryferry communicaions, we propose a hierarchical conrol framewor called Dispach-and-Search (DAS), which divides he problem ino he global conrol of allocaing ferries among pariions and he local conrol of searching for nodes wihin each pariion. Since he global conrol can be implemened by he base saion (BS), ferries only need o communicae wihin each pariion o cooperae in local conrol. Rigorous problem formulaion: Under Marovian node mobiliy, we show ha boh he global and he local conrol problems can be formulaed as special cases of Parially Observable Marov Decision Process (POMDP). he POMDP model provides a comprehensive represenaion of available informaion, conrol opions, and performance objecive in a dynamic sysem, which provides a foundaion for rigorous opimizaion. Efficien soluions: Based on he POMDP formulaion, 2 Boh wors rea each message as a new node which arrives randomly in space and ime; [1] also assumes locaions and arrival imes of pending messages are nown a run ime. ferry dispach r Base Saion reurn search domain 1 gaeway node domain 2 Figure 1: Example: K = 3 ferries serve D = 2 domains in despach-and-search mode. r: (projeced) ferry communicaion radius. we develop soluions o global and local conrol in he form of dynamic conrol policies. Due o he inrinsic hardness of POMDP, we focus on an efficien heurisic called he policy. In paricular, in global conrol, even he policy can be compuaionally expensive; hence, we also develop wo approximaions wih similar performance bu much lower complexiy han he exac soluion. Performance insighs: We evaluae he proposed policies hrough boh analysis and simulaions. We obain closedform upper and s on he opimal local performance, as well as a on he opimal global performance which can be evaluaed numerically. Moreover, we compare our soluions wih predeermined conrol hrough exensive simulaions, which show significan performance gain (e.g., 1%+ on discouned hroughpu and 4%+ on undiscouned hroughpu) in cases of minimum prior nowledge. We assume Marovian mobiliy models o rigorously represen he impac of pas acions on fuure decisions. Alhough a limiing assumpion, i includes popular models such as random wal/waypoin and heir variaions as special cases and can be viewed as a second-order approximaion of general mobiliy models. he res of he paper is organized as follows. Secion 2 specifies he newor model and he DAS framewor, Secion 3 gives a brief overview of POMDP, Secions 4 and 5 address he local and he global conrol problems respecively, Secion 6 presens simulaion evaluaion, and Secion 7 concludes he paper. 2. PROBLEM FORMULAION 2.1 Newor Model Suppose ha here is a oal of K ferries and D newor pariions, each pariion occupying a disjoin region called a domain, as illusraed in Fig. 1. A base saion (BS) generaes raffic for all nodes a consan raes 3 λ = (λ d ) D (fluid model), which is hen delivered o he respecive domains by he ferries in a delay-oleran manner. In each domain, we selec a node as he gaeway o he ferries, which hen disseminaes received raffic wihin he domain using in-newor rouing; we will simply call hese gaeway nodes nodes. According o he ferry-node communicaion range, each domain is pariioned ino N d (d = 1,..., D) cells denoed by a se S d (S d S d = for d d ), such ha he ransmission of each ferry can cover one cell a a ime. We 3 We use x o denoe scalars and x o denoe vecors.

dispach round ime search reurn Figure 2: DAS procedure: Dispach conrol occurs a he ime-scale of rounds ( + 1 slos) and search conrol a he ime-scale of slos. assume ha node mobiliy in domain d can be modeled as a Marov chain on S d wih ransiion marix P d. We also assume ha ferry-ferry communicaions can span an enire domain bu no across domains. he oal newor field is denoed by S := D S d of size S = N = D N d. We do no resric he relaionships beween K, D, and N d, alhough resricions may apply in laer analysis (e.g., Secion 5.3). 2.2 Sysem Archiecure Due o he ferry-ferry communicaion consrain, ferries can only cooperae locally wihin a domain. Accordingly, we propose a hierarchical conrol sysem called Dispachand-Search (DAS), consising of a global dispach conroller π g a he BS in charge of allocaing ferries o individual domains, and a local search conroller π l,d for each domain d, locaed a an arbirarily seleced lead ferry, in charge of joinly navigaing ferries dispached o ha domain. Specifically, he lead ferry will collec observaions from he oher ferries in is domain and broadcas navigaion acions deermined by he search conroller in erms of he se of cells o cover (and heir maching wih he ferries) in each slo in order o conac he gaeway node (by any ferry) and deliver raffic. We assume ha all ferries dispached o he same domain carry he same raffic. As illusraed in Fig. 2, dispach conrol operaes periodically wih period + 1, called a round ( is a design parameer); he firs slos are used o search for he gaeway nodes in he individual domains; he las slo is used for all he ferries o reurn o he BS and pic up new raffic, afer which he nex round begins 4. 2.3 Main Problem: DAS Conrol Le λ π β := E[ Nπ ()β ] denoe he discouned oal hroughpu for a given discoun facor β (, 1) under a DAS conrol policy π := (π g, (π l,d ) D ), where N π () is he amoun of daa delivered (over all domains) a ime under policy π. Our goal is o design a policy π ha maximizes λ π β. Remars: he discouns are used o model he objecive of delivering he daa as soon as possible. he above definiion is a cumulaive hroughpu, bu i represens a delivery rae, defined as he raio of he delivered raffic and he delivery ime λ π β := E[( Nπ ()β )/( β )], in ha λ π β = λ π β(1 β)/β. 3. PRELIMINARY 3.1 POMDP Recap POMDP [8] is a general framewor o model he conrol of a sochasic sysem. We provide a quic overview of POMDP in his secion. A POMDP is represened by 4 he above assumes ha he ferries can move beween he BS and any domain in one slo. Longer BS-domain disances can be modeled by replacing he las slo by slos during which a round rip can be compleed; all he resuls hold by replacing + 1 wih +. a uple (X, A, O, r, P x, P o), where X represens he possible saes of he conrolled sysem, A he se of conrol acions, O he observaions of conroller, and r(x, a, o) a reward funcion modeling he performance crierion based on he sae x X, he acion a A, and he observaion o O. he oher elemens P x and P o describe how he sae evolves (P x(x, a, x ) := Pr{x +1 = x x = x, a = a}) and he observaion is generaed (P o(x, a, o) := Pr{o = o x = x, a = a}), and are derived from he firs hree elemens in a given problem seing. Under parial observabiliy, he conroller does no now he rue sae x precisely, bu can only esimae i up o a disribuion b = (b(x)) x X, called he belief, based on he acion and observaion hisory. he objecive of he conroller is o find a policy π ha deermines he acion o ae based on he curren belief so as o maximize he oal reward over a ime window. Wih an infinie horizon, a common form of he oal reward is he discouned reward R π := E[ β r(x, a π, o )] (1) for a fixed discoun facor β (, 1). he opimal policy π ha maximizes R π is nown o be he soluion o he Bellman equaion: V (b) = β arg max E[r(b, a, o) + V ( (b, a, o))], (2) a where V (b) := max π R π b1 =b, called he value funcion, is he maximum reward saring from a given belief sae b, he expecaion is over P o(b, a, o) := x b(x) Po(x, a, o), and he funcion b = (b, a, o) generaes a new belief for he nex sep based on he previous belief, he acion, and he observaion. I is nown ha solving (2) for he opimal policy is PSPACE-hard in general [7]. A popular alernaive is o use he policy in which one only maximizes he average immediae reward, and is given by π (b) := arg max b(x)e[r(x, a, o)], (3) a where he expecaion is over P o(x, a, o). he policy is easy o implemen and has exhibied excellen performance in single-ferry conrol [2]. We will focus on his policy in he sequel and examine is differen forms and performances a differen levels of conrol. 3.2 Applying POMDP o Muli-Ferry Conrol he use of POMDP is naural for he problem under consideraion. Under he DAS archiecure, he goal of muliferry conrol is o joinly design global dispach conroller and local search conroller such ha he overall (discouned) hroughpu is opimized. he challenge is ha boh conrollers face a dynamic environmen ha changes over ime, e.g., he search conroller relies on node disribuions, which evolve due o node mobiliy, and he dispach conroller relies on (prediced) conac processes as well as raffic demands for each domain. Moreover, characerisics of his dynamic environmen are only parially observable o he conrollers due o limied ferry-node communicaion range. POMDP is well suied for addressing hese challenges using is model of dynamic sysems under parial observabiliy. he local and he global conrollers have inheren connecions: he global acion affecs he local conrol because he number of ferries dispached o a domain will largely x

deermine how long i aes o conac he node; he local performance also affecs he global conrol because he dispach conroller needs o weigh he impac of dispaching more/fewer ferries o each domain o mae a balanced decision. he oher parameers such as domain sizes, node mobiliy models, and raffic raes all play a role in he performance. How can we capure all hese facors in he POMDP framewor? Wha reward funcions should we use for local/global conrol so ha ogeher hey achieve hroughpu opimizaion? hese are he main quesions we will answer in he sequel. 4. LOCAL CONROL: SEARCH Consider a domain d o which {,..., K} ferries have been dispached in he curren round; assume N, size of he domain (subscrip d is dropped). Given iniial node disribuion b and mobiliy model P, he goal of he local conroller π l is o joinly navigae he ferries o conac he node and deliver raffic. Moreover, due o he discoun in hroughpu calculaion, i is inuiively desirable o mae he conac as soon as possible. In his secion, we cas he problem of local conrol as a POMDP and esablish lower and s on he opimal reward, where he is provided by he policy. Based on he bounds, we discuss condiions under which he policy is close o opimal. 4.1 Local Conrol as a POMDP In he language of POMDP, he problem can be formulaed as follows: 1. Local sae: x l = (s, δ ), where s S is he node s locaion a ime, and δ {, 1} indicaes wheher he nex conac will be he firs in he curren round (δ = 1) or no (δ = ); 2. Local acion: a l S wih a l denoes he se of cells for he ferries o cover in slo (no wo ferries cover he same cell); 3. Local observaion: o l = z {, 1} is a join conac indicaor, i.e., z = I s a l, where I denoes he indicaor funcion; 4. Local reward: he one-ime reward r l (x l, a l, o l ) = δz gives a uni reward for he firs conac, afer which δ ransis o and no furher reward occurs; he overall reward is defined as in (1) for r l. We elaborae on he preceding POMDP formulaion: le Υ π l denoe he ime ill (he firs) conac (C) under policy π l. I is easy o see ha = E[β Υπl ]. Since β x is a decreasing funcion, maximizing leads o minimizing he C in a discouned average sense 5. Inuiively, he local reward represens he weigh each uni of delivered raffic will conribue o he discouned oal hroughpu λ β, saring from he curren round. Laer we will show ha local policies opimizing his reward will help o opimize he overall λ β under DAS framewor (see Secion 5.1). 5 he convexiy of β x implies ha he mean C saisfies E[Υ π l ] log / log β. 4.2 Local Conrol Policy he rue sae x l is no direcly nown o he conroller since node locaion s is unnown. Insead, he conroller observes anoher sae y l, which consiss of he belief b of s (b (s) := Pr{s = s a l 1,..., a l 1, o l 1,..., o l 1}) and he indicaor δ. A he end of he slo, he new sae y+1 l = (b +1, δ +1) ransis as 6 { b+1 = P (z e s + (1 z )b \a l ), = δ (1 z ), δ +1 where e s or b \a l is he updaed belief based on observaion z, and muliplying i by he ransiion marix P gives he prediced belief for he nex slo. he search policy is he one ha maximizes he probabiliy of immediae conac: πl (b) = arg max a l (4) s a l b(s), (5) i.e., he ferries will search he cells wih he op probabiliies in he belief. 4.3 Performance of Local Conrol For a given search policy π (subscrip l is dropped for simpliciy), le Υ π denoe he C under π for a given iniial belief b and ferries. he disribuion of he random variable Υ π can be characerized as follows. Condiioned on he even ha no conac has occurred before (i.e., he previous 1 slos are misses), le a π and b π denoe he acion and he belief under policy π in slo, and p π he condiional probabiliy of conac. By definiion, we have p π = s a π bπ (s), and a π, b π can be compued from he following updaes: saring from b π 1 = P b, a π = π(b π ), b π +1 = P b π \a π, = 1, 2,... (6) Based on p π, i is easy o see ha he disribuion of Υ π and he corresponding R π are given by 1 Pr{Υ π = } = p π (1 p π j ), (7) 1 R π = β Pr{Υ π = } = β p π (1 p π j ). (8) In paricular, plugging in a π = πl he reward of policy R. (b π ) for π l in (5) yields Alhough he above mehod will give he reward for any given policy, i does no indicae how close is performance is o he opimal. Characerizaion of he opimal reward R is an open quesion, since i is compuaionally inracable o compue he opimal policy. Hence, we derive he following lower and s on he opimal reward. he lower bound comes naurally from he policy. For he upper bound, consider he passive belief ransiions wihou any observaion: b () := (P ) b. Le B, := max a = s a b() (s) be he sum of he larges elemens in b (). 6 Here P denoes marix ranspose, e s he uni vecor wih one in he sh elemen, and b \a he poserior belief afer a miss given by b \a (s) = for all s a and b \a (s) = b(s)/( s a b(s )) oherwise.

Define 1 1 R := β (1 B, ) + β B,, (9) where := inf{ 1 : 1 1 B j, B, }. We have he following resul. heorem 4.1. he reward R of he opimal search policy saisfies: R R R. o prove he heorem, we inroduce he following lemmas; heir proofs can be found in Appendix. Lemma 4.2. For any policy π, is reward R π is monoone increasing wih he condiional conac probabiliy p π for any. Lemma 4.3. For any and any π, p π B, max(1 1 B j,, B, ) =: p. (1) Proof: (heorem 4.1) he holds rivially. For he, Lemma 4.2 implies ha subsiuing p in Lemma 4.3 ino (8) will give an on he reward: 1 R π β p (1 p j ) 1 1 = β (1 B, ) + β B, = R, which implies R R as he bound holds for any π. Noe ha 1 (1 p j ) = 1 1 B j,. Closed-form bounds can be obained by furher bounding and R (see Appendix for proofs). R Corollary 4.4. For ferries and a domain of size N (cells), R β/[n(1 β) + β]. In general, here is no non-rivial closed-form, i.e., R can approach β arbirarily. Under cerain condiions, however, we have he following resul. Corollary 4.5. If he iniial belief is he seady-sae disribuion b (b = P b ), hen R = β [1 B, ( 1)] + B,(β β ), (11) 1 β where B, := max a = s a b (s) and = B 1,. Moreover, if P is doubly sochasic 7, hen he above reduces o R β/[(1 β)n]. Remar: For a large domain (N ), Corollary 4.4 implies ha R β/[(1 β)n], achievable by he search policy. he wors case is when he node moves equal liely in all direcions (i.e., P is doubly sochasic) and he iniial disribuion is uniform, under which Corollary 4.5 says β/[(1 β)n] is also an. In his case, he opimal reward R β/[(1 β)n]. Noe ha he 7 ha is, each column of P also sums up o one. seady-sae disribuion b is uniform if and only if P is doubly sochasic. Similar analysis can also be performed for he mean C E[Υ π ]. Given a policy π, he mean C can be compued by E[Υ π ] = pπ 1 (1 pπ j ). Analogous o Lemma 4.2, i can be shown ha E[Υ π ] is monoone decreasing wih each p π. Accordingly, we can bound he minimum mean C by E[Υ] min π E[Υ π ] E[Υ ], (12) where E[Υ ] is he mean C for he policy, bounded by E[Υ ] N/, and E[Υ] is an analyical analogous o R, given by E[Υ] = (1 1 B, ) + 1 B,. Under he condiions ha b = b and P is doubly sochasic, E[Υ] can be relaxed o E[Υ] 1 (1 + N ). 2 Noe ha he opimal policy ha maximizes R π may no achieve he minimum E[Υ π ]. 5. GLOBAL CONROL: DISPACH he global conrol operaes on op of local conrol a he macro ime scale of rounds. Given local search policies, he goal of he global dispach policy is o allocae he ferries among he domains a he beginning of each round o maximize he oal hroughpu. In his secion, we formulae he global conrol as anoher POMDP, based on which we develop an approximae dispach policy wih a significanly lower complexiy han he original and analyze is performance. 5.1 Global Conrol as a POMDP he global conrol problem can be formulaed as he following POMDP: 1. Global sae: x g τ = (s τ, m τ), where s τ = (s τ(d)) D sill denoes node locaions in each domain, and m τ = (m τ(d)) D denoes he buffer sae of he BS, where m τ(d) is he amoun of raffic generaed for domain d, all a he beginning of round τ (i.e., a ime ( + 1)(τ 1)); 2. Global acion: a g τ = (a g τ(d)) D represens he disribuion of he ferries among he domains in round τ (i.e., a g τ(d) N and D ag τ(d) = K); 3. Global observaion: o g τ = (υ τ,b τ), where υ τ = (υ τ(d)) D denoes he Cs for each domain, and b τ = (b τ,d) D he updaed beliefs of node locaions a he end of round τ (a ime ( + 1)τ); if here is no conac wih a domain eiher because no ferry serves he domain (a g τ(d) = ) or because he ferries fail o conac he node wihin ime, define υ τ(d) := ; 4. Global reward: he one-ime reward is defined as r g(x g, a g, o g ) := D m(d)βυ(d), whereas he overall reward has a slighly differen form from (1): := E[ β ( +1)(τ 1) r g(x g τ, a πg τ, o g τ)]. (13) τ=1 We now explain he meaning of his reward funcion. Lemma 5.1. he global reward is equal o he discouned oal hroughpu λ β (see Secion 2.3) under he dispach policy π g and he associaed search policies.

Proof: For a domain d, delivery occurs if and only if here is a conac wihin, each conaining m τ(d) daa (assuming he enire ferry buffer conen can be delivered in one sho) a ime ( + 1)(τ 1) + υ τ(d). Is hroughpu is hus E[ τ=1 mτ(d) β( +1)(τ 1)+υτ(d) ]. Summing over all domains gives = λ β. Remars: Our conrol objecive of maximizing λ β is hus convered ino maximizing a global level via ferry allocaion a g τ and maximizing he weighs E[β υ(d) ] a local level via ferry navigaion a l in each domain (under given a g τ). Noe ha he laer is consisen wih he local reward 8 in Secion 4.1. Here he period is inroduced o faciliae synchronizaion among ferries. Is selecion involves a radeoff: a smaller means lower probabiliies of conac per round bu more rounds during a given ime, whereas a larger means higher probabiliies of conac per round bu fewer rounds. he exac value can be uned o opimize he overall performance (see Fig. 8 and 11). 5.2 Global Conrol Policy Le y g τ denoe he observed sae. Since one par (s τ) of he sae x g τ is parially observable and he oher par (m τ) is compleely observable, we have a mixed sae y g τ := (b τ, m τ), where b τ = (b τ,d ) D are he beliefs of node locaions s τ as repored by he local conrollers. A he end of each round, he new beliefs b τ+1 will go hrough a sequence of ransiions depending on he acions and observaions of he local conrollers by (4). Insead of repeaing hose deails, we simply consider he oupu b τ as par of he global observaion such ha b τ+1 = b τ. he BS buffer sae m τ+1 ransis differenly for each domain depending on wheher he domain receives service or no: m τ+1(d) = λ d ( + 1) + m τ(d)i υτ(d)>. (14) he global conrol is much harder o solve han he local conrol due o he large soluion space wih A = ( ) K+D 1 K acions. he dispach policy is given by: D πg (y g ) := arg max m(d)e[β υ(d) ]. (15) a g Le Υ(d) denoe he C in domain d wihou deadline consrain, whose disribuion is given by (7) for iniial node disribuion b d and a g (d) ferries. By definiion, we can compue E[β υ (d)] as a runcaed average β Pr{Υ(d) = }. he problem is ha even he policy can become complex if he acion space is large. o address his issue, we loo a simpler approximaions. Firs, we noe ha he global reward is relaed o he local reward as follows. Lemma 5.2. For each domain wih a g (d) > and local conrol reward d, Rπ l d β E[β υ(d) ] d. Proof: See Appendix. he lemma indicaes ha for sufficienly large, he policy for global conrol will allocae he ferries o maximize he weighed sum of local conrol rewards d m(d)rπ l d. Assuming search policies are used for local conrol, from he previous analysis (Corollary 4.4), we can replace 8 A suble difference is ha υ(d) is he C runcaed a, alhough he difference will be small for large ; see Lemma 5.2. d by is Rπ l d βag (d)/[(1 β)n d + βa g (d)], which gives an approximae dispach policy: πg (y g ) arg max a g D m(d)a g (d) (1 β)n d + βa g (d). (16) A nice propery of his approximaion is ha he funcion f d (a) := m(d)a/[(1 β)n d + βa] is increasing and concave in a, i.e., exra ferries for a domain only provide diminishing gain compared wih exising ferries. his monooniciy means ha insead of searching all ( ) K+D 1 K possible values of a g, we can sequenially dispach one ferry a a ime o maximize he reward gain among all he domains. his propery implies he following algorihm for implemening (16). Saring from a g =, repea for = 1,..., K: 1. find d = arg max,...,d f d (a g (d) + 1) f d (a g (d)); 2. updae a g (d ) a g (d ) + 1. Compared wih he O( D ( ) K+D 1 K ) complexiy 9 (per round) of he dispach policy, his approximaion significanly reduces he complexiy o O(KD). In he special case of large domains (N d K, d), we can furher simplify (16) ino an asympoically approximae dispach policy: πg (y g ) arg max a g D m(d)a g (d) N d, (17) for which he opimal acion is simply o dispach all he ferries o domain d = arg max d m(d)/n d. In conras, for smaller domains, he ferries may be spli among muliple domains in a round. In fac, under (16), hey will be dispached o he same domain if and only if d such ha f d (K) f d (K 1) f d (1) f d () for any d d. 5.3 Performance of Global Conrol o analyze he performance of global conrol, i is necessary o characerize he evoluion of buffer saes {m τ } τ=1. For each domain d, if q τ(d) denoes he delivery probabiliy in round τ, hen we see from (14) ha {m τ(d)} τ=1 follows a Marovian-lie ransiion diagram in Fig. 3. However, i is no a real Marov chain, as q τ(d) depends on he node beliefs and he policies, maing i inracable for analysis. In his secion, we derive bounds for he closed-form dispach policy (17) o obain insighs under he assumpion of N d K, d. q τ(d) 1 q τ(d) 1 q τ(d) 1 q τ(d) λ d 2λ d 3λ d... q τ(d) Figure 3: Evoluion of buffer sae m τ(d). Under his policy, he delivery probabiliy q τ(d) and he average immediae reward E[β υτ(d) ] can be bounded in closed form as follows. Lemma 5.3. Under search policies and he dispach policy in (17), le d τ := arg max d m τ(d)/n d denoe 9 By precompuing E[β υ(d) a g (d) = ] for each d = 1,..., D and = 1,..., K, we can reduce he complexiy o O(DK + D ( ) K+D 1 K ) a he cos of exra space.

he domain served in round τ. We have ( q τ(d τ) 1 1 K ) =: q N τ, (18) dτ ( ) ) βk (1 β 1 K E[β υτ(dτ) N dτ ]. (19) N dτ β(n dτ K) Proof: See Appendix. hese bounds sugges he following evoluion of m 1 τ : { mτ + λ( + 1) m τ(d τ)e dτ w.p. q m τ+1= τ, (2) m τ + λ( + 1) o.w. Since d τ is a funcion of m τ, we see ha he process {m τ } τ=1 following evoluion (2) is a Marov chain. his chain also deermines he reward: since he expeced immediae reward is m τ(d τ)e[β υτ(dτ) ], applying (19) gives a on he reward ( ) ) βkm τ(d τ) (1 β 1 K N dτ r(m τ) :=, (21) N dτ β(n dτ K) which is only a funcion of m τ. hese resuls lead o he following performance bound. heorem 5.4. Under search policies and he approximae dispach policy in (17), he discouned oal hroughpu is ed by E[ τ=1 β ( +1)(τ 1) r(m τ) m 1 = ] =:, (22) where he expecaion is over he Marov chain {m τ } τ=1 specified in (2). Proof: See Appendix. Generally, (22) does no have a closed-form soluion as i depends on he ransien saisics of he Marov chain. Hence, we develop a numerical mehod o approximae i arbirarily. Le M τ := {(m τ (i), p τ (i) )} denoe he se of all possible values of m τ and he probabiliy p τ of reaching i from he iniial buffer sae m 1. hen Algorihm 1 compues a -sep approximaion of by compuing Mτ (τ = 1,..., + 1) ieraively. In each ieraion (lines 2 8), i enumeraes all elemens of M τ (line 4), compues he possible nex saes and heir probabiliies in M τ+1 (line 7), and accumulaes he corresponding reward (line 8). Obviously, is a of Rπg. On he oher hand, i is easy o see ha assuming immediae delivery for every domain in rounds τ > gives an : ( D ) Rπg β( +1) p +1 m +1(d) (m +1,p +1 ) M +1 + β( +1)(+1) ( + 1) 1 β +1 ( D ) λ d. (23) Replacing line 2 by a sopping rule requiring he above bound ǫ for a consan ǫ > will guaranee he resuling o be ǫ-close o Rπg. Noe ha he size of Mτ grows exponenially as M τ = 2 τ 1, and hus he accuracy of he above approximaion will be limied by he complexiy consrain. 1 Here w.p. sands for wih probabiliy and o.w. for oherwise. Algorihm 1 Evaluae Global Reward Require: Domain size (N d ) D, daa rae λ, number of ferries K, round lengh, discoun facor β, iniial BS buffer sae m 1, and horizon. Ensure: Reurn he -sep approximaed reward lower bound. 1: M 1 {(m 1, 1)}, = 2: for τ = 1 o do 3: M τ+1, τ τ 1 4: for all (m τ, p τ) M τ do m 5: d τ arg max τ(d) d N ( d ) 6: q τ 1 1 K N dτ 7: add (m τ + λ( + 1) m τ(d τ)e dτ, p τq τ), (m τ + λ( + 1), p τ(1 q τ)) o M τ+1 8: τ τ + β ( +1)(τ 1) p τr(m τ) 6. SIMULAION AND COMPARISON WIH PREDEERMINED CONROL Having developed he DAS policies, we now evaluae heir performance agains appropriae benchmars. A benchmar of paricular ineres is he class of predeermined conrol policies specified by fixed ferry roues. In he sequel, we will derive he opimal predeermined policies for local and global conrol and compare hem wih he proposed policies. 6.1 Local Conrol For he predeermined local conroller, nowledge of node locaion is fixed a he seady-sae disribuion b. hus, o maximize he chance of conac, he bes policy for ferries is o wai for he node a he mos probable cells s (i) (i = 1,..., ) such ha b (s (1) ) b (s (2) ).... We will call his he policy. We now compare he policy wih he search policy. Suppose he node follows a 2-D random wal on a grid wih he level of mobiliy conrolled by an aciveness parameer α := j i P(i, j) (i.e., probabiliy of moving o a differen cell), saring from he seady-sae disribuion (uniform disribuion). As illusraed in Fig. 4 5, he performance of boh policies improves (reward increases while C decreases) as he number of ferries increases, and he policy clearly ouperforms he policy. he performance gap, however, varies depending on he level of mobiliy, wih a larger gap in low mobiliy cases. Inuiively, his is because he policy relies on node mobiliy o creae conac opporuniies, whereas he policy will acively search for he node and is hus less affeced. We also plo he lower and s on he reward (Corollary 4.4 4.5) and he corresponding bounds on is mean C; we noe ha boh bounds rac he acual performance consisenly and ha he bounds are fairly igh. We have also conduced similar simulaions under biased mobiliy (Fig. 6 7), modeled by a localized random wal wih ighness parameer η [1]. his model allows non-uniform ransiion probabiliies P(i, j) e η j h biased oward a home cell h (if η ), where j h denoes he axicab disance beween cells j and h, and η conrols he level of bias (η = for sandard random wal). In he simulaions, he home cell was a he cener of he grid. he resuls indicae ha performance rends and comparisons are similar

1 2 3 4 5 6 7 8 9 1.9.8.7.6.5.4.3 E[Υ π l] 1 3 1 2 1 1 1.9.8.7.6.5 25 2 E[Υ π l] 15 1.2.4 5.1.3 (a) Local reward. 1 (b) Mean C. 1 2 3 4 5 6 7 8 9 1 Figure 4: Myopic search policy vs. policy: low-mobiliy random wal (α =.1, η =, N = 25, β =.9, 5 Mone Carlo runs)..2 1 2 3 4 5 6 7 8 9 1 (a) Local reward. 1 2 3 4 5 6 7 8 9 1 (b) Mean C. Figure 7: Myopic search policy vs. policy: high-mobiliy, localized random wal (α =.9, η =.5, res as in Fig. 4). 1.9.8.7.6.5.4.3.2 1 2 3 4 5 6 7 8 9 1 (a) Local reward. 35 3 25 E[Υ π l] 2 15 1 5 1 2 3 4 5 6 7 8 9 1 (b) Mean C. Figure 5: Myopic search policy vs. policy: high-mobiliy random wal (α =.9, η =, res as in Fig. 4). 1.9.8.7.6.5.4.3.2 1 2 3 4 5 6 7 8 9 1 (a) Local reward. 6 5 4 E[Υ π l] 3 2 1 1 2 3 4 5 6 7 8 9 1 (b) Mean C. Figure 6: Myopic search policy vs. policy: low-mobiliy, localized random wal (α =.1, η =.5, res as in Fig. 4). o hose under symmeric mobiliy, bu wih a smaller gap beween he wo policies, as biased mobiliy provides more informaion o he policy hrough is non-uniform seady-sae disribuion. he analyical bounds are looser in hese cases. In all of he above simulaion resuls, we observe ha he reward and he C merics boh improve monoonically as he number of ferries increases, bu he marginal gain becomes smaller, paricularly in he case of C; his is in accordance wih he heoreical resuls developed in Secion 4.3. 6.2 Global Conrol he problem of predeermined global conrol is analogous o ha of dynamic global conrol in Secion 5.1 excep ha he beliefs of node locaions are fixed a heir seady saes, and he BS buffer sae is replaced by is expecaion m τ. For fair comparison wih he dispach policy (15), we consider he following predeermined dispach policy: π g(m) = arg max a g D m(d)e[β υ (d) ], (24) where υ (d) is he C in domain d under he policy, saring from he seady-sae disribuion. Afer one round, he expeced buffer sae will evolve as m τ+1(d) = λ d ( + 1) + m τ(d)pr{υ (d) > }. A crucial difference beween (24) and (15) is ha E[β υ (d) ] and Pr{υ (d) > } are fixed for any given a g (d). herefore, given he iniial buffer sae m 1 = m 1, he enire sequences of {m τ } τ=1 and {a g τ } τ=1 are deermined. o complee he policy specificaion, we need o evaluae E[β υ (d) ] and Pr{υ (d) > } for a given a g (d) = {1,..., K}. Le H (index d is dropped for simpliciy) denoe he hiing ime for he mos probable cells {s (i) } i=1 in he seady sae b, saring from b. hen E[β υ (d) ] = β Pr{H = } and Pr{υ (d) > } = Pr{H > }. By analysis similar o ha in Secion 4.3, we have Pr{H = } = p 1 (1 p j), where p is he condiional conac probabiliy given by p = i=1 b (s (i) ), for b 1 = b and b +1 = P b \{s (i) } i=1. We now compare he performance of he above predeermined policy wih ha of he dispach policy (15) and is approximaions (16 17). We assume 2-D random wal for each domain wih idenical size, aciveness, and raffic rae. We firs evaluae he performance of hese policies for differen values of he round lengh ; see Fig. 8 ( approx 1 for (16) and approx 2 for (17)), where for each, we simulae he policies for enough ime ( 1 slos) o approximae he oal reward. Ineresingly, alhough i aes much longer o guaranee a conac (a leas 5 slos even if nodes do no move), all he dispach policies achieve heir bes performance a a smaller : = 3 for he policy (15) and is approximaion (16), = 2 for is oher approximaion (17), and = 1 for he predeermined policy (24). Based on he above resuls, we compare he policies under heir bes wih respec o (wr) he discouned hroughpu λ β, := E[ N()β ] (Fig. 9), where N() denoes he amoun of raffic delivered in slo, and wr he undiscouned hroughpu λ β=1, (Fig. 1). he policy significanly ouperforms he predeermined policy on boh merics (by 125% on λ β, and by 47% on λ 1,). Moreover, he wo approximaions provide close-o- performance a much reduced complexiies, where (16) even ouperforms he policy on λ β,. his shows ha policy is subopimal for global conrol. We also evaluae an ǫ-approximaion (wih ǫ =.25) of he analyical bound in (22) by Algorihm 1 ( ), which gives a conservaive for policies. We repea he above simulaions for a heerogeneous case

1 2 3 4 5 6 7 8 9 1 12 11 1 approx 1 approx 2 predeermined 12 1 2 18 16 approx 1 approx 2 predeermined 9 8 14 8 12 7 6 λβ, 6 λ1, 1 8 5 4 3 4 2 approx 1 approx 2 predeermined 6 4 2 2 Figure 8: Opimizing (D = 2, N d 25, α d.5, η d, K = 5, λ d 1, β =.9, 5 Mone Carlo runs). 2 4 6 8 1 Figure 9: Myopic vs. predeermined dispach policy: discouned hroughpu ( opimized by Fig. 8, res as in Fig. 8). 2 4 6 8 1 Figure 1: Myopic vs. predeermined dispach policy: undiscouned hroughpu (same as Fig. 9). of differen mobiliy parameers and raffic raes for each domain under he localized random wal model defined in Secion 6.1; see Fig. 11 13 (he opimal is 2 for he policies and 1 for he predeermined policy). he resuls show similar rends, excep ha he performance gap beween he policies and he predeermined policy shrins (he policy ouperforms he predeermined policy by 68% on λ β, and by 26% on λ 1,). Inuiively, his is because in heerogeneous cases, prior nowledge such as domain sizes, mobiliy parameers, and raffic raes already provides good disincion beween domains, which helps he predeermined dispach policy jus as biased mobiliy helps he policy in local conrol. 7. CONCLUSION We have considered he conrol of muliple daa ferries in pariioned wireless newors. Compared wih he lieraure, our soluion deals wih a more challenging case of sochasically moving nodes and limied ferry-node and ferry-ferry communicaion ranges. Using he approach of sochasic conrol, we develop a fully dynamic soluion via he hierarchical policy of dispach and search, which only requires local cooperaion beween ferries and provides subsanial performance improvemens over exising predeermined conrol in high uncerainy scenarios (e.g., uniform node seady-sae disribuions, idenical pariions). Alhough we have focused on daa disseminaion in presening he deailed soluion, our approach is applicable o oher communicaion scenarios as well. Under oher communicaion modes, he local conrol remains he same whereas he global conrol needs o be modified. For example, a global POMDP similar o ha in Secion 5.1, wih m τ(d) denoing he raffic generaed by domain d and reward r g(x g, a g, o g ) := D m(d)β +1 I υ(d), corresponds o daa harvesing (wih delayed picup). A more complicaed variaion can model peer-o-peer communicaion via he relay of BS by racing he buffer saes of each source domain and he BS wih a se of sae variables and using he BS buffer sae m τ o compue he reward r g(x g, a g, o g ) as in Secion 5.1. Deailed sudies are lef o fuure wor. 8. REFERENCES [1] G. D. Celi and E. Modiano. Dynamic vehicle rouing for daa gahering in wireless newors. In IEEE CDC, December 21. [2]. He, K.-W. Lee, and A. Swami. Flying in he dar: Conrolling auonomous daa ferries wih parial observaions. In ACM MobiHoc, 21. [3] D. Henel and. Brown. On conrolled node mobiliy in delay-oleran newors of unmannned aerial vehicles. In ISAR, 26. [4] D. Henel and. Brown. owards auonomous daa ferry roue design hrough reinforcemen learning. In IEEE/ACM WoWMoM, 28. [5] D. Jea, A. Somasundara, and M. Srivasava. Muliple conrolled mobile elemens (daa mules) for daa collecion in sensor newors. In DCOSS 5. [6] V. Kaviha and E. Alman. Analysis and design of message ferry roues in sensor newors using polling models. In IEEE WiOp, May 21. [7] C. Papadimiriou and J. sisilis. he complexiy of marov decision processes. Mah. of Operaion Research, 1987. [8] E. Sondi. he opimal conrol of parially observable marov processes over he infinie horizon: Discouned coss. OR, 1978. [9] M. ariq, M. Ammar, and E. Zegura. Message ferry roue design for sparse ad hoc newors wih mobile nodes. In ACM MobiHoc, 26. [1] B. Waler,. Clancy, and J. Glenn. Using localized random wals o model delay-oleran newors. In IEEE MILCOM, 28. [11] J. Wu, S. Yang, and F. Dai. Logarihmic sore-carry-forward rouing in mobile ad hoc newors. IEEE rans. PDS, 27. [12] Z. Zhang and Z. Fei. Roue design for muliple ferries in delay oleran newors. In IEEE WCNC, 27. [13] W. Zhao, M. Ammar, and E. Zegura. Conrolling he mobiliy of muliple daa ranspor ferries in a delay-oleran newor. In IEEE INFOCOM 5. [14] W. Zhao, M. Ammar, and E. Zegura. A message ferrying approach for daa delivery in sparse mobile ad hoc newors. In ACM MobiHoc, 24. APPENDIX Proof of Lemma 4.2 Based on (8), we have R π p π 1 = β (1 p π j ) = [ 1 s=+1 β s p π s s 1,j (1 p π j ) ( ) (1 p π j )] β E[β Υπ Υ π > ]. herefore, R π is monoone increasing wih p π.

1 2 3 4 5 6 7 8 9 1 12 11 1 approx 1 approx 2 predeermined 12 1 25 2 approx 1 approx 2 predeermined 9 8 15 8 7 6 5 λβ, 6 4 2 approx 1 approx 2 predeermined λ1, 1 5 4 Figure 11: Opimizing (D = 3, N := (N d ) D = (16, 25, 36), α = (.9,.5,.1), η = (,.2,.5), K = 5, λ = (1,.75,.5), β =.9, 5 Mone Carlo runs). Proof of Lemma 4.3 Firs, we prove by inducion ha (recall b () = (P ) b ) b π b () (i) (i) 1 1 s a π j 2 4 6 8 1 Figure 12: Myopic vs. predeermined dispach policy: discouned hroughpu ( opimized by Fig. 11, res as Fig. 11)., i S, 1. (25) b (j) (s) a he hird value, i is N 2 4 6 8 1 Figure 13: Myopic vs. predeermined dispach policy: undiscouned hroughpu (same as Fig. 12). (β N +1 1 1 β 1 log β log β ) + β β < 1 β (1 β)n. Combining boh gives R β/[(1 β)n]. For = 1, b π 1 = P b = b (1). For > 1, b π (i)= b π 1(j)P j,i 1 s a π b π 1 (s) 1 j a π 1 1 1 b () (i) s a π j b (j) (s), obained by applying (25) for b π 1(j). his proves (25). hen, we apply he above o p π = s a π bπ (s): p π for p defined as in (1). 1 1 Proof of Corollary 4.4 s a π b() (s) s a π j b (j) (s) p he proof is based on he fac ha p Lemma 4.2, we will R /N, which yields R Proof of Corollary 4.5 /N,. By wih by replacing p β N (1 β N ) 1 = N(1 β) + β. If b = b, hen b () b,. Accordingly, B, B,, and = B 1,. Plugging hese ino (9) gives (11). If, in addiion, P is doubly sochasic, hen i is nown ha b is uniform, i.e., B, = /N, applying which o (11) gives [ R = β N/ 1 ( N )] N 1 + (β β N/ ). (26) N(1 β) If we maximize he righ-hand side of (26) wih respec o N, calculaion shows ha he maximum is achieved a N = N, N + 1, or N 1 1 + 1. A he firs wo 1 β log β values, he righ-hand side is Proof of Lemma 5.2 he holds because d = E[βΥ(d) ] while E[β υ(d) ] is a runcaed average. he is because d E[βυ(d) ] = Proof of Lemma 5.3 = +1 β Pr{Υ(d) = } β Pr{Υ(d) > } β. (27) Le p be he condiional conac probabiliy in domain d τ. By definiion, we have p K/N dτ for he search policy. From he analysis of local conrol (Secion 4.3), we have q τ(d τ) = 1 (1 p) and 1 E[β υτ(dτ) ] = β Pr{υ τ(d τ) = } = β p (1 p j), boh increasing wih p. Plugging in he of p yields he resuls. Proof of heorem 5.4 Due o discoun, he oal hroughpu is an increasing funcion of he probabiliy of delivery q τ(d τ) and is hus ed if q τ(d τ) is replaced by is in (18), maing m τ evolve according o (2). Moreover, for given (b τ, m τ), he expeced immediae reward under he dispach policy (17) is m τ(d τ)e[β υτ(dτ) ], which is ed by r(m τ) defined in (21) for any b τ due o (19). Combining hese wo facs proves he resul. β(1 β N/ ) N(1 β) < β (1 β)n ;