Online Batch Scheduling for Flow Objectives

Online Batch Scheuling for Flow Objectives Sungjin Im Benjamin Moseley January 16, 2013 Abstract Batch scheuling gives a powerful way of increasing the throughput by aggregating multiple homogeneous jobs. It has applications in large scale manufacturing as well as in server scheuling. In batch scheuling, when explaine in the setting of server scheuling, the server can process requests of the same type up to a certain number simultaneously. Batch scheuling can be seen as capacitate broacast scheuling, a popular moel consiere in scheuling theory. In this paper, we consier an online batch scheuling moel. For this moel we aress flow time objectives for the first time an give positive results for average flow time, the k-norms of flow time an maximum flow time. For average flow time an the the k-norms of flow time we show algorithms that are O(1)-competitive with a small constant amount of resource augmentation. For maximum flow time we show a 2-competitive algorithm an this is the best possible competitive ratio for any online algorithm. Department of Computer Science, Duke University, Durham NC 27708-0129. sungjin@cs.uke.eu. Partially supporte by NSF grant CCF-1008065. Toyota Technological Institute, Chicago IL, 60637, USA. moseley@ttic.eu

1 Introuction Using k-norms an not l k -nroms shoul we stick to requests or jobs shoul be say scheule or broacaste A majority of works in scheuling literature stuies the case where all jobs must be processe sequentially an only on a single machine at any point in time. However, more general problems arise practice. For instance, jobs coul be parallelizable. In this setting a job can be be processe on multiple machine simultaneously. Scheuling parallelizable jobs has receive a significant amount of attention in theoretical scheuling literature, for instance [13, 9, 15, 23, 14]. The reason to aress the parallelizability of a job is to be able to process more work per unit time by using more machines. Another way to increase the work being one per time unit is to batch homogeneous jobs. In a batch scheuling setting, jobs of the same type can be aggregate an processe together simultaneously. Thus, one unit of processing can ecrease the work that nees to be performe on more than one job. Batch scheuling arises in many real work systems. For example, if jobs require the same coe, then the server can let the jobs share the coe loae into the memory. Or in a multicast network, many ifferent clients can receive a communication simultaneously. Batch scheuling is not only restricte to computers an networks. In particular, in manufacturing lines, such as semiconuctor manufacturing, tasks are groupe together to be processe simultaneously. See [22], for a variety of inustry applications of batche scheuling. A special type of batch scheuling, known as broacast scheuling, has receive a significant amount of attention recently in scheuling literature. In the broacast scheuling moel, there are n pages of ata store at a server. Over time requests arrive for specific pages. When the server broacasts a page p, all unsatisfie/outstaning requests are satisfie simultaneously. Broacast scheuling fins applications in multicast systems, LAN an wireless networks [24, 1, 2]. Notice that here any number of requests can be satisfie simultaneously. This can be viewe as having an infinite batch size. In practice, however, there coul be a limit on the number of requests that a server can hanle simultaneously, as was pointe out in [7]. In this paper, we consier online batch scheuling where there is possibly some limit on the number of jobs that can be processe simultaneously. In the online setting, the scheuler is only aware of a request once it is sent to the system. Inspire by the broacast moel, we consier a generalization this moel to the batche setting. As in the broacast moel, there are n pages of information store at the sever. Requests arrive over time for ifferent pages. The sever can satisfy up to B p requests for page p simultaneously. Here B p is ifferent for each page p an takes some integral value in (0, ]. We note that the moel we consier is not only restricte to applications in wireless networks. Inee, one can think of the n pages as n ifferent types of jobs. Over time requests come for a specific type of job p to be one an up to B p jobs of type p can be one simultaneously. This moel accurately captures a variety of batche settings, for instance batch scheuling tasks in an assembly line. The flow time of a job is the amount of time it takes the sever to satisfy a job. A client is intereste in having the flow time of their job minimize. Let J p,i be the ith request for page p an say that this request arrives at time r p,i. Each page p has a size l p which specifies the amount of work on page p that is require to complete a request for page p. We assume that a page consists of a sequence of unit size pieces (p, 1), (p, 2),..., (p, l p ). The sever can broacast/scheule one piece of a page at each time step. A requests will nee to be in a batch for each piece of a page it requeste to be complete. For a given scheule, let C p,i be the time J p,i is complete in the system. The flow time of a job is C p,i r p,i. The scheuler s goal is to etermine how jobs shoul be processe so that the clients are given goo service. That is, the scheule s goal is to optimize a quality of service metric. Perhaps the most popular quality of service metrics are base on the flow time of the jobs. In this paper, we stuy flow time objectives for the first time in batch scheuling when the batch size is between 1 an. The most popular flow time base quality of service metric is minimizing the total or average flow time. 1

Here the scheules goal is to minimize p,i C p,i r p,i an this objective essentially minimizes the average quality of service. Unfortunately, in by optimizing average flow time, the scheuler coul give a few clients poor quality of service as to optimize the flow time of other jobs. Thus, it can be seen that optimizing average flow time is not necessarily fair to all of the jobs. An objective that focuses on ensuring no job is given poor quality of service is minimizing the maximum flow time. Finally, an objective that is also use to ensure fairness while not being as stringent as maximum flow time is minimizing the k-norms of flow time. Here the objective is to minimize k p,i (C p,i r p,i ) k for some k (1, ). By focusing on minimizing the k-norms for small values of k greater than one the scheuler optimizes the variance in the requests flow time, which enforces fairness [6]. It can be note that average flow time is equivalent to setting k = 1 an maximum flow time is equivalent to setting k =. Each of these objectives are wiely consiere in scheuling theory an it epens on the requirements of your system for which objective you woul consier optimizing. In theoretical scheuling literature, batch scheuling has been stuie only for the objective of maximizing the total throughput [7, 16]. For this objective, each request J p,i has a release time r p,i an ealine p,i. If some request is complete by it ealine, then the scheuler receives some profit w p,i. The goal of the scheuler is to maximize the profit obtaine. Flow objectives are quite ifferent from the throughput objective. For example, there exists a simple 2-competitive greey algorithm for the global throughput [7]. 1 However, it is known that any eterministic online algorithm is Ω(n)-competitive for the average flow time objective [20]. This strong lower boun hols even in the special case of broacast scheuling. Further any ranomize algorithm is Ω( n)-competitive [3]. To give a flavor of the ifficulty of batch scheuling, it woul be useful to consier the following algorithm. As iscusse before, the main reason to consier batch scheuling is the ability to satisfy multiple jobs simultaneously. Hence, it is natural to think that the algorithm which processes a page with the most number of requests woul perform well for batche scheuling. This algorithm is known as Most Requests First (MRF). It is known that MRF is 2-competitive for maximizing throughput [21, 11], however it is far from the optimum for minimizing average flow time, maximum flow time an the k-norms of flow time. Consier the following simple example where the batch size can be at most two for each page. At time 0, n/2 requests arrive, one for each istinct page. For simplicity, assume that all pages are unit-size an thus can be complete in a unit time. In aition, at each integer time until time n 2, the same page p is requeste twice. Then the algorithm MRF will broacast page p at every time step until time n, thereby leaving many requests unsatisfie for a long time. However, one can broacast page p every other time step an process other requests in between. This scheule s objective is a factor of Ω(n) less for each of the mentione objectives over MRF. As mentione, there are strong lower bouns on the competitive ratio of any online algorithm in the batche scheuling setting for the problems of minimizing average flow time an the k-norms for k (0, ). Due to these strong lower bouns, we consier analyzing algorithms in the popular resource augmentation [19]. In the resource augmentation moel, an algorithm is given extra resources over the aversary an then the competitive ratio is boune. In previous work, the resource augmentation usually comes in the form of being able to process jobs at a faster rate. An algorithm is sai to be s-spee c-competitive if it can process jobs s times faster than the aversary. In our setting, we will consier algorithms that possibly have resource augmentation on how fast jobs can be processe as well as resource augmentation on the number of jobs that can be satisfie together. We will say that an algorithm is s-spee -capacity c-competitive if the algorithm can process jobs s times faster than the aversary, the batches can be times larger an the algorithm achieves a competitive ratio of c. The ultimate goal of a resource augmentation analysis is to fin a constant competitive algorithm even with a minimum amount of extra resource augmentation. 1 We note that [7] stuies a more general batch scheuling setting in the offline setting. 2

Our Results: As mentione, we consier minimizing flow time objectives for the first time in the batche scheuling setting. First we consier the problems of minimizing average flow time an the k-norms of flow time. Due to space constraints, the results for the k-norms can be foun in the appenix. For both of these problems we consier a generalization of an algorithm that was consiere in the broacast scheuling setting [5]. Both of these results will follow a similar framework. We will specify two ifferent cases epening on the size of pages. One is when pages are all unit size (i.e. l p = 1 for all p) an the other is when pages are varying size. We are able to show the following theorems for the case of unit size pages. Theorem 1.1. For any ɛ > 0, there exists an online algorithm that is (1 + ɛ)-spee 2-capacity O(1/ɛ 3 )- competitive algorithm for minimizing average flow time when pages are unit-size. Theorem 1.2. For any ɛ > 0 an any fixe integer k [1, ), there exists an online algorithm that is (1 + ɛ)-spee 2-capacity O(1/ɛ 4 )-competitive algorithm for minimizing the k-norms of flow time when pages are unit-size. We exten these result for varying size pages. Here we require a further relaxation on the capacity constraints. Theorem 1.3. For any ɛ > 0, there exists an online algorithm that is (1 + ɛ)-spee 6-capacity O(1/ɛ 3 )- competitive algorithm for minimizing average flow time when pages are varying-size. Theorem 1.4. For any ɛ > 0 an any fixe integer k [1, ), there exists an online algorithm that is (1 + ɛ)-spee 6-times O(1/ɛ 4 )-competitive algorithm for minimizing the k-norms of flow time when pages are varying-size. Finally, we consier the problem of minimizing the maximum flow time. In this case we o use resource augmentation at all, neither on the processing spee nor the capacity of the batch size. In this case the algorithm consiere is the natural extension of the algorithm first-in-first out which always prioritizes the request that arrive the earliest. Theorem 1.5. There exists a 2-competitive algorithm for minimizing the maximum flow time. This result is tight, since any ranomize online algorithm was shown to have competitive ratio larger than 2 ɛ for any fixe constant ɛ > 0 even in the special case of B p = [10, 11]. Due to space constraints, this result can be foun in Appenix C. Our Technical Contributions: As mentione before, our algorithm an analysis are inspire by the previous work in broacast scheuling [5, 14, 12]. We first give a high-level view of the approach in [5, 14] for the k norms objectives, 1 k <. The main ifficulty of broacast scheuling comes from the fact that an optimal scheule can be very effective in processing multiple requests simultaneously compare the the online scheuler. This is the case particularly since the optimal scheuler knows the future input while our algorithm oes not. Hence it is challenging for an online algorithm to ecie when an what to broacast. That is, the online algorithm may want to wait more hoping for a chance to aggregate more requests, however, it increases the wait time of such waiting requests. The key iea that was use in [5, 14] is to first obtain a fractionally goo scheule. To illustrate the iea, we assume that all pages are unit size where requests arrive at integer times, an it takes a unit time to broacast a page p. In the real (aka integral) scheule, the scheuler is allowe to transmit only one page per time step. However, in the fractional scheule, at each instantaneous time, multiple pages can be broacaste fractionally, up to a total of one unit. Then in fractional scheule, a request J p,i is satisfie, if page p is transmitte by one unit since its release time r p,i. They then convert their scheule for the fractional case, to a vali real scheule, where only one page is broacaste in a time step, in a online manner. 3

Until this, our algorithm an analysis are fairly similar to the previous work. We will also consier a fractional scheule an convert it to a vali scheule in an online manner. However, in batch scheuling, one has to ecie which requests get satisfie in a batch; whereas, in the broacast scheule all requests get satisfie together. Deciing which requests get satisfie makes the problem more challenging. Our algorithm at any point in time, will ecie a single request that is wants to satisfy. However, now it also nees to ecie which requests to group in a batch with this request. Our algorithm will efine balls aroun each request where a ball consists of which requests get satisfie if this request is chosen to be satisfie. The ball will be efine on requests for the same page that arrive the most closely to this request. Using this, we will show a fractionally goo scheuler. However, the online conversion to a vali scheule is much more challenging in our case. For the online conversion, we will nee use a more intricate analysis. The reason is that when pages have varying sizes the batch a request is in can change over time an we will nee to be careful about how the algorithm an optimal solution group requests. To o this, we will etermine some structural properties of the optimal solution that we can compare against. Relate Work: Broacast scheuling for minimizing the average flow time is NP-har [10]. It is also NP-har to optimize the maximum flow objective [10]. Since broacast scheuling is a special case of the problems consiere in this paper, each of the problems we consier is NP-har. The best known approximation for the average flow in broacast scheuling is a O(log 2 n/ log log n)-approximation [4]. For the k-norms objective, [14] gave the first scalable algorithm. For the problem of minimizing the maximum flow time, [12] gave a 2-approximation. 2 Formal Problem Definition an Notation There are n pages of information that are available at the server. Each request J p,i arrives at an integer time r p,i, an this is the first time when the scheuler is aware of this request. Here we use J p,i to enote the request for page p with the ith earliest arrival time; ties are broken arbitrarily but a fixe way. For any p an i, we will say that the two requests J p,i an J p,i+1 are ajacent. Each page p has size l p. That is, the page p consists of l p pieces of unit size information, (p, 1), (p, 2),..., (p, l p ). At each time t (or equivalently between time t an t + 1), the server can transmit only one piece of information, an further can service up to B p requests when transmitting a piece of information for page p. Note unlike in broacast scheuling that the server nees to ecie which requests to serve if there are more than B p outstaning requests. A request J p,i is satisfie at the first time when the request, since its release time r p,i, receives all pieces of information (p, 1), (p, 2),..., (p, l p ) in the sequential orer. However, such a sequence of transmissions oes not have to be continuous, an can be interrupte by other transmissions. We let C p,i enote the completion time of request J p,i. We will istinguish two cases epening on the pages sizes: unit size pages an varying size pages. In the unit size pages case, l p = 1 for all pages p, an hence oes not have to ecie which piece of information to transmit. In all objectives we consier in this paper, the server is require to satisfy all requests. As mentione before, k-norms of flow time is efine as k p,i (C p,i r p,i ) k. This objective becomes the total (or equivalently average) flow time an the maximum flow time when k = 1 an k =, respectively. When the algorithm is given spee (1 + ɛ), we assume that the algorithm is allowe to transmit an extra piece of information at every 1/ɛ time steps; here 1/ɛ is assume to be an integer. When the algorithm is of η-capacity, it can serve up to ηb p requests of the same page p. 3 Preliminaries an Overview of the Analysis For the objectives of minimizing the average flow time an the k-norms of flow time, we will first show that an online algorithm is fractionally scalable. The main analysis tool for this is potential functions. Potential functions have become popular for scheuling algorithms an their analysis. See [18] for a tutorial. Then we convert the fractional scheule into an integral one with aitional spee augmentation. In this 4

section, we first efine a fractional scheule, an give a quick overview of potential function base analysis. Also we will briefly iscuss the online conversion from a fractional scheule into an integral one. The maximum flow time objective completely iffers from the analysis for the k norms, an hence will be iscusse on its own in Appenix C. Fractional Flow Time: In the fractional scheule, there are no ifferent pieces of information for a page. Hence when efining a fractional scheule, we only nee to specify which page is being processe at a time together with the requests that are being satisfie. Up to B p requests for page p can be groupe in a batch for page p. Then the (fractional) completion time C p,i of a request J p,i is efine as the first time t > r p,i when J p,i has been inclue in a batch for page for a total of l p uring [r p,i, t]. Note that a real (integral) scheule σ also efines the fractional completion time C f,σ p,i of a request J p,i, an is no bigger than its integral completion time Cp,i σ. To istinguish fractional scheule from integral scheule, we will say that a page is processe rather than broacaste. The avantage of fractional flow time is two-fol. Firstly, this makes potential function base analysis more applicable Particularly, in the analysis of the continuous changes of the potential function. Seconly, this allows us not to worry about restarting a broacast ue to newly arriving requests for the same page. To see this, consier an alive request for page p that has been partially processe. Now when new requests arrive for the same page p, we nee to ecie whether to start broacasting page p from the beginning or resume from the piece just after that by which the ol request has been satisfie. Potential Functions: We show that our online algorithms are fractionally scalable for average flow an the k-norms of flow using potential functions. The potential function Φ(t) changes over time, an have non-continuous an continuous changes. Non-continuous changes can occur when request arrive or are complete by our algorithm or the optimal scheuler. Continuous changes can occur when our algorithm or the optimal scheuler process some pages. We can without loss of generality compare our algorithm s fractional cost to the optimal scheuler s fractional cost, rather than to the optimal scheuler s integral cost, since the fractional cost can be only smaller. The continuous change can occur also ue to time elapse (this will only be the case for the k-norms of flow time). We will show that the potential function satisfies the following properties: Φ(0) = Φ( ) = 0. All non-continuous changes of Φ(t) are non-positive. For some constants c 1, c 2 > 0, at any time t, t Φ(t) c 1 t COST A(t) + c 2 t COST OPT(t). Here COST A (t) enotes the total cumulative cost of our algorithm at time t. The A(t) be the unsatisfie requests for page p at time t in the scheule A. Here COST A (t) = J p,i A(t) (t r p,i) for the total flow time objective, an note that COST A ( ) is the final objective. For the k-norm objective, we let COST A (t) := J p,i A(t) k(t r p,i) k 1 an k COST A ( ) is the final objective in this case. Likewise, COST OPT (t) enotes the optimal scheuler s cumulative cost at time t. For more etails of the usage of potential functions, see the survey [18]. If the above constraints are satisfie, we can easily show that COST A ( ) c 2 c 1 COST OPT ( ). Online Conversion into an Integral Scheule: We give an online algorithm that converts into a fractional scheule into an integral scheule that increases the flow time of each request only by a constant factor. To o this, when a request J p,i is fractionally complete, we insert the request into a queue together with its lag ρ p,i := C f p,i r p,i. We will give an online algorithm that satisfies all requests J p,i within a constant times their lag since they enter the queue. Here the lags are carefully use to prioritize the pages to broacast an which request shoul be inclue in a batch. We will give two conversion algorithms, one for unitsize pages, an one for varying-size pages. The latter will require more relaxation on the batch size. We note that if the flow time of any request increases by at most a constant factor, for any of the objectives 5

we consier, the overall objective of the integral scheules is at most a constant factor larger than for the fractional scheule. Optimal Scheule s Structure: We make the following observation regaring the fractional optimal scheule s structure.note that the following lemma hols for the k-norms of flow time as well as maximum flow time an average flow time. NOTE: I o not want to consier only fractional scheulers because the FIFO proof nees this lemma an the proof only works in the sequential (non-beffering or fractional) moel Lemma 3.1. Consier any k norm of flow time for k {1, 2, 3,..., }. There exists an optimal solution in the integral moel where each request J p,i receives the pieces (p, j) of page p before J p,i+1 receives (p, x) for all p, i, x. That is, requests are processe in first-in-first-out orer. Proof. Among all possible optimal scheules in the integral moel, consier a scheule σ that minimizes the number of inversions against the first-in-first-orer. Here we say that a pair of requests J p,i an J p,j, i < j, an a piece of information (p, k) of page p, incur an inversion if the later arriving request J p,j receives the kth piece of information earlier than J p,i. We show that there is in fact no inversion in σ. For the sake of contraiction, suppose not. Fix any pair of requests J p,i, J p,j, that have an inversion. Consier changing the way J p,i an J p,j are satisfie in this scheule. For each piece (p, x) of page p, let J p,i be inclue in the first batch where one of J p,i or J p,j receives (p, x). Let J p,j be in the latest batch either J p.j an J p,i receive (p, x). Notice that this is a vali scheule since r p,i r p,j. Further, since the objective function is convex, this can only reuce the objective an reuces the number of inversion, which is a contraiction to the efinition of σ. Thus, we can assume that the optimal solution satisfies requests in first-in-first-out orer. Organization: In Section 4, we present our results for average flow time an prove Theorem 1.1 an 1.3. To o this, we will first show that a natural variant of the algorithm Latest Arrival Processor Sharing (LAPS) [15, 5] is fractionally scalable. We complete the algorithm an analysis by proviing the online conversions into an integral scheule. As mentione, the two conversions are for the unit-size pages an varyingsize pages cases, respectively. Due to the space constraints, we efer the analysis of latter conversion to Appenix A. The k-norms results an the maximum flow result are presente in Appenix B an C, respectively. 4 Average Flow Time In this section we consier the problem of minimizing the average flow time of the requests. As mentione, we will consier the objective of fractional flow time first. Later we will show how to convert this to an integral scheule. For the conversion we will consier the unit size page case an the varying size page case separately. However, for the fractional case, we will assume that pages can possibly have varying sizes. 4.1 Fractional Flow Time 4.1.1 Algorithm The algorithm we use is an aaptation of the algorithm LAPS [15, 5]. Let 0 < ɛ < 1/10 be a fixe constant. We will assume that the algorithm is given s = (1 + 10ɛ)-spee. The algorithm essentially istributes its processing power amongst the the latest ɛ fraction of the latest arriving requests. In particular, let A(t) enote the set of unsatisfie requests at time t in LAPS s scheule. Let A (t) be the set of ɛ A(t) latest arriving requests in A(t). Let J q,k be the earliest arriving request in A (t). Let h p,i (t) = s/(ɛ A(t) ) for each request J p,i A (t) \ J q,k. For the request J q,k, let h q,k (t) = s(ɛ A(t) ɛ A(t) + 1)/(ɛ A(t) ). Intuitively, h p,i enotes the share of processing power request J p,i receives. Note that only the requests in 6

A (t) receive processing power at time t. It can be seen for all times t that J p,i A (t) h p,i(t) = s, the spee the algorithm is given. Now we nee to efine how the processing power is actually istribute at time t. We assume that our algorithm is 2-factor-capacity, i.e. can serve up to 2B p requests for page p when broacasting page p. For each request J p,i A (t), let Q p,i (t) = {J p,j A(t) i B p + 1 j i + B p 1}. The is a ball of at most 2B p requests that arrive the close to request J p,j. This is the set of requests J p,i can effect. Now each request in Q p,i is processe at a rate of h p,i (t) ue to (the share of) request J p,i. Thus, a request J p,j A(t) for page p is processe at a total rate of J p,i A (t),j p,j Q p,i (t) h p,i(t) at time t. 4.1.2 Analysis In this section we analyze the algorithm LAPS. Let OPT enote some fixe optimal solution for the fractional setting. For the sake of analysis, we efine a potential function that satisfies the three properties escribe in Section 3. To this en, we nee to efine some notation. Let ON(t 1, t 2, p, i) = t 2 t=t 1 h p,i (t) be the total amount WLAPS evotes to request J p,i uring [t 1, t 2 ). Let τ p,i be the first time that OPT works on request J p,i. Let OPT(t 1, t 2, p) enote the amount OPT processes page p uring [t 1, t 2 ) up to a maximum of l p. Now we create the following variable for each request J p,i, z p,i (t) = ON(t,, p, i) OPT(τ p,i, t, p) Now we are almost reay to efine the potential function. Let RANK(p, i, t) = J q,k A(t),r q,k r p,i 1 be the total number of alive requests in A(t) that arrive earlier than J p,i. Now our potential function is, Φ(t) = 5 ɛ J p,i A(t) RANK(p, i, t) z p,i(t) l p Recall from Section 3 the properties we nee to show. The first property Φ(0) = Φ( ) = 0. We now boun each of the possible changes in the potential function separately below. Arrival Conition: Consier when J p,i arrives at time t. Note that when a job arrives, the rank of every other job remains unchange. Further the term RANK(p, i, t) z p,i(t) l p is ae to the potential function. However, it can be seen that z p,i (t) = 0 at the time t when J p,i arrives since OPT has not ha a chance to work on this request yet. Completion Conition: When a request J p,i is complete at time t then the rank of all jobs that arrive later that J p,i coul possibly ecrease. However, since the potential function is always positive, it can be seen that this can only ecrease the potential function. Also the term RANK(p, i, t) z p,i(t) l p is remove from the potential. Again, this can only ecrease the potential. We have shown that no non-continuous change is positive. We now focus on continuous changes of Φ(t) that can occur ue to our algorithm an OPT s processing. Running Conition: We first consier the case where OPT processes pages. We can without loss of generality assume that OPT processes at most one page at time t. Say that OPT processes page p. Let O (t) be the set of requests which OPT satisfies by working on page p. Note that O (t) B p by efinition of page p s batch size. Further, note that O contains at most B p ajacent requests ue to the FIFO-nature of OPT proven in Lemma 3.1. Also note that when page p is processe by OPT, the z p,i variable remains unaffecte for all requests J p,i where t < τ p,i an for all requests J p,i that have been satisfie by OPT. The first case is because OPT oes not start working on such requests by efinition of τ p,i, an hence OPT(τ p,i, t, p) for those requests oes not increase. The secon case is because OPT(τ p,i, t, p) can be at most l p an it must be l p if OPT complete request J p,i, so OPT(τ p,i, t, p) oes not increase at time t. Now, for any request J p,i A(t) O (t) where τ p,i t the variable z p,i will increase at a rate of ON(t,, p, i). We boun this total increase in the following lemma. 7

Lemma 4.1. If OPT processes p at time t then J p,i A(t) O (t),τ p,i t ON(t,, p, i) l p. Proof. We know that O (t) contains at most B p ajacent requests. Further, by efinition of LAPS, if LAPS works on any request in O (t) then it satisfies any other request in O (t); here the relaxation that LAPS is of 2-factor-capacity is crucially use. However, this implies that J p,i A(t) O (t),τ p,i t ON(t,, p, i) can be at most l p, because all the requests in O (t) will be complete by WLAPS scheule if l p amount of processing is evote to the requests in O (t) for page p. With this lemma an by the fact that the rank of any job is at most A(t), we have the total increase in the potential ue to OPT s processing is at most 5 ɛ J p,i A(t) O (t),τ p,i t ON(t,, p, i) RANK(p, i) 5 A(t) (1) l p ɛ Now we boun the change in the potential ue to the algorithm s processing of pages. Notice that for each request J p,i A (t) it is the case that ON(t,, p, i) ecreases at a rate of h p,i (t). Thus, z p,i (t) ecreases at a rate of h p,i (t)opt(τ p,i, t, p). Also, notice that for any request J p,i, it is the case that OPT(τ p,i, t, p) = l p if OPT has compete job J p,i by time t. Let O(t) be the set of alive jobs in OPT s scheule at time t. The ecrease in Φ(t) ue to the algorithm s processing is the following. 5 ɛ 5 ɛ J p,i A (t) J p,i A (t) 5 ɛ (1 ɛ) A(t) RANK(p, i, t) h p,i(t)opt(τ p,i, t, p) l p (1 ɛ) A(t) h p,i(t)opt(τ p,i, t, p) J p,i A (t)\o(t) 5 (1 ɛ) A(t) ɛ 5 (1 ɛ) A(t) s ɛ 5 (1 ɛ) A(t) s ɛ J p,i A (t) l p h p,i (t) h p,i (t) J p,i O(t) A (t) J p,i O(t) A (t) 5 5s (1 ɛ)s A(t) s ɛ ɛ 2 O(t) [(1 ɛ) A(t) RANK(p, i, t) for all J p,i A (t)] [ OPT(τ p,i, t, p) = l p for all requests J p,i / O(t)] J p,i O(t) A (t) h p,i (t) h p,i (t) [Since J p,i A (t) h p,i(t) = s] 1 ɛ A(t) [h p,i (t) 1/(ɛ A(t) ) by efinition of LAPS] By combining this with (1), we erive the last property we want to show. t Φ(t) 5 ɛ A(t) 5 5s (1 ɛ)s A(t) s + ɛ ɛ 2 O(t) 5 ɛ A(t) 5 10 (1 ɛ)(1 + 10ɛ) A(t) O(t) ɛ ɛ2 [s = (1 + 10ɛ) by efinition an s 2] A(t) + 10 ɛ 2 O(t) 8

Since A(t) an O(t) are the instantaneous increase in the LAPS an OPT objectives, respectively, we conclue that our algorithm LAPS is (1 + 10ɛ)-spee 10 ɛ 2 -competitive. 4.2 Unit Size Page Conversion In this section we show how to take the algorithm we presente for minimizing average flow time in fractional batch scheuling an construct an online algorithm for the integral batche scheuling such that the integral scheule has a small amount of extra resource augmentation an the no request s flow time increases by more than a constant factor. We note that this conversion also applies to our algorithm we introuce for the k-norms of flow time presente in Appenix B. To show this, let A be the algorithm for fractional batch scheuling. Say that A is given (1 + ɛ) spee where 0 < ɛ < 1. Note that we assume that A has (1 + 10ɛ) in the previous section, but we can assume it has 1 + ɛ spee by scaling ɛ. We construct an algorithm A for integral batch scheuling. We will assume that A can scheule one unit size pages in a time step an four extra unit size pages every 1/ɛ time steps. Note that this is essentially equivalent to A having 1 + 4ɛ spee. For the conversion, we o not consier any arbitrary algorithm that performs well for the fractional objective because we will nee some key properties of A to perform the conversion. In particular, the algorithm A is require to group two requests J p,i an J p,j in the same batch only if i B p + 1 j i + B p 1. More precisely, if this conition is not satisfie, no share that one of the two requests receive cannot be use to satisfy the other request. This is the way our algorithms are escribe for both average flow time an the k-norms of flow time. For our online conversion algorithm, we maintain a queue Q. Whenever a request J p,i is finishe by A in the fractional scheule, it is ae to the queue Q, an waits to be complete in the integral scheule. We let ρ p,i enote the flow time of request J p,i in A s scheule, an call ρ p,i the lag of request J p,i. At any time, the algorithm A fins (if any) a request J p,i with the smallest lag ρ p,i, an transmits page p. The batch satisfie by this transmission will be all the requests J p,j such that i B p + 1 j i + B p 1. Note that the batch size is at most 2B p. We will say that J p,i forces A to scheule page p at this time. Now our goal is to show that any request J p,i is complete within 10 ɛ ρ p,i time steps in A s scheule. This will then imply that the average flow time of the requests in the scheule A is at most a factor 10 ɛ larger than that for A. Theorem 4.2. Any request J p,i is satisfie by time r p,i + 10 ɛ ρ p,i in A. For the sake of contraiction, suppose that there exists a request J q,k that has a flow time greater than 10 ɛ ρ p,i in the scheuler of A. Let t be the first time that J q,k has waite more than 10 ɛ ρ p,i time steps. Let t 1 be the earliest time before time t such that every request that forces A to scheule uring [t 1, t ] has lag at most ρ p,i. Let N p be the set of requests that force A to scheule page p uring [t 1, t]. Our first goal is to show that no two requests in N p can be satisfie at any point simultaneously in A. Lemma 4.3. For all p, any two requests in N p cannot be satisfie simultaneously at any point in A. Proof. Consier any page p an two istinct requests J p,i, J p,j in N p. We assume without loss of generality that J p,i forces A to scheule p before J p,j. Let t be the time A satisfies J p,i. There are two cases. In the first case, say that r p,j t. In this case, the only reason J p,j is not satisfie is that either j < i B p + 1 or j > i + B p 1. That is, J p,j is not inclue in the batch with J p,i. However, we know that in this case J p,i cannot satisfie simultaneously at any point in A by efinition of A. For the secon case, assume that r p,j > t. Note that t r p,i + ρ p,i since J p,i is ae to to the queue Q at the time r p,i + ρ p,i when J p,i is complete in A. Hence no processing of J p,i in A cannot be use to satisfy J p,j. Next we show that any request that forces A to scheule uring [t 1, t ] i not arrive too early. Lemma 4.4. For any request J p,i that forces A to scheule uring [t 1, t ], it is the case that r p,i t 1 ρ q,k. 9

Proof. For the sake of contraiction, suppose that there is a request J p,i that arrive before t 1 ρ q,k an force A to scheule uring [t 1, t ]. By efinition of A, J p,i is available to be scheule in A at time r p,i + ρ p,i. Note from the efinition of t 1 that ρ p,i ρ p,k, an hence r p,i + ρ p,i r p,i + ρ p,k < t 1. However, this implies that every request that force to scheule a page uring [r p,i + ρ p,i, t 1 ] has lag less than ρ p,i ρ q,k. This contraicts the efinition of t 1. Finally, we will show the following theorem about A. Proof of [Theorem 4.2] We know that A scheules a total of (1 + 4ɛ)(t t 1 ) pages uring t t 1. Further, by Lemma 4.3 these requests cannot be satisfie simultaneously in A. We know by Lemma 4.4 that these requests arrive no earlier than t 1 ρ q,k. Finally, we know that these requests must be complete by time t, since no request can force A to scheule until they are complete in A. Thus, A must o a volume of (1 + 4ɛ)(t t 1 ) work uring [t 1 ρ q,k, t ]. This implies that (1 + 4ɛ)(t t 1 ) (t t 1 + ρ q,k ). However, we know that t 1 r q,k + ρ q,k by efinition of t 1 an t r q,k 10 ɛ ρ q,k by efinition of J q,k. However, this is a contraiction an we erive the theorem. 4.3 Varying Size Page Conversion In this section we show how to convert a fractionally goo scheule into an integral scheule with a small amount of extra spee augmentation so that no request s flow time increases by more than a constant factor. Like the conversion presente in Section 4.2, this conversion can be use for the average flow objective an also for the more general k-norms of flow objective. However, this conversion is more general in that it works for varying size pages. However, this requires more relaxation concerning batch size we will allow our online conversion can use a batch of size up to 6B p for all pages p. Furthermore, we restrict that in the input fractional scheule, two requests J p,i an J p,j can be in the same batch only if i B p + 1 j i + B p 1, which will be crucial in performing the conversion. Due to the space constraints, we here present only the conversion algorithm. The full analysis can be foun in Appenix A. To escribe this new conversion, we first set up some notation. Let A enote the fractional input scheule. For simplicity, the reaer may assume that A refers to the fractional algorithm we give in Section 4.1 or Appenix B (or more precisely the resulting fractional scheule). Say that A is given (1 + ɛ) spee where 0 < ɛ < 1 (We assume that A has (1 + 10ɛ) previously, but we can assume it has 1 + ɛ spee by scaling ɛ). We construct an algorithm A for integral batch scheuling. We will assume that A can scheule one unit size page in a time step an four extra unit size pages every 1/ɛ time steps. Note that this is essentially equivalent to A having 1 + 4ɛ spee. We will show in in Appenix A that any request J p,i is complete within 10 ɛ ρ p,i time steps in A s scheule. For each request J p,i let ρ p,i enote the flow time of request J p,i in A s scheule. We will call ρ p,i the lag of request J p,i. We group requests for page p seamlessly into groups of size 2B p base on the orer in which they arrive. More precisely, for each page p, let B i p contain the requests J p,j for page p such that 2iB p + 1 j 2(i + 1)B p. For each request J p,i, we efine the start time for J p,i to be the last time J p,i was inclue in a batch where the first piece of page p was transmitte. Note that a request s start time can restart if the first piece of page p is transmitte again an J p,i is inclue in the batch. Starting times will be crucial in our analysis. Also we nee to make sure that each request keeps track of the next piece of information it nees to receive. More formally, each request J p,i is associate with a tuple p, i, s, a where s enotes the start time for J p,i an a enotes the next piece of page p that J p,i will nee to receive. Initially, s = an a = 1 when J p,i arrives. Now at time t, the algorithm A fins the request J p,i which has the smallest lag amongst those requests which have been complete in the scheule A. The algorithm is going to broacast a piece of page p at this time. However, we nee to be careful about which requests are satisfie in the current batch as well as which piece of page p is broacaste. Let p, i, s, a be the tuple associate with J p,i. Then, the ath piece of page p is broacaste at time t. If a 2 then the requests in the batch are all those for page p with start 10

time s. For each of these requests, we increment the a value in the tuple so that it wait for the following piece of information of page p. If a = 1 then say that J p,i Bp h for some h. The batch will be all those requests for p in Bp h 1 Bp h Bp h+1 that are currently alive an unsatisfie in A. Any request J p,j that is in the batch, sets its start time s to t an the next piece a to receive to 2 (here restart can happen). Note that a request in the batch coul alreay have a finite start time, but in this case it just resets its start time no matter how much of page p it has alreay receive. At time t, we say that J p,i forces A to scheule. See Appenix A for the analysis of this algorithm. 11

References [1] S. Acharya, M. Franklin, an S. Zonik. Dissemination-base ata elivery using broacast isks. Personal Communications, IEEE [see also IEEE Wireless Communications], 2(6):50 60, Dec 1995. [2] Demet Aksoy an Michael J. Franklin. R x W: A scheuling approach for large-scale on-eman ata broacast. IEEE/ACM Trans. Netw., 7(6):846 860, 1999. [3] Nikhil Bansal, Moses Charikar, Sanjeev Khanna, an Joseph (Seffi) Naor. Approximating the average response time in broacast scheuling. In SODA 05: Proceeings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 215 221, 2005. [4] Nikhil Bansal, Don Coppersmith, an Maxim Svirienko. Improve approximation algorithms for broacast scheuling. SIAM J. Comput., 38(3):1157 1174, 2008. [5] Nikhil Bansal, Ravishankar Krishnaswamy, an Viswanath Nagarajan. Better scalable algorithms for broacast scheuling. In ICALP 10: Proceeings of the Thirty-seventh International Colloquium on Automata, Languages an Programming, pages 324 335. Springer, 2010. [6] Nikhil Bansal an Kirk Pruhs. Server scheuling to balance priorities, fairness, an average quality of service. SIAM J. Comput., 39(7):3311 3335, 2010. [7] Amotz Bar-Noy, Suipto Guha, Yoav Katz, Joseph (Seffi) Naor, Baruch Schieber, an Haas Shachnai. Throughput maximization of real-time scheuling with batching. ACM Trans. Algorithms, 5(2):18:1 18:17, March 2009. [8] Yair Bartal an S.Muthukrishnan. Minimizing maximum response time in scheuling broacasts. In SODA 00: Proceeings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 558 559. SIAM, 2000. [9] Ho-Leung Chan, Jeff Emons, an Kirk Pruhs. Spee scaling of processes with arbitrary speeup curves on a multiprocessor. In SPAA 09: Proceeings of the Twenty-first Annual ACM Symposium on Parallel Algorithms an Architectures, pages 1 10, 2009. [10] Jessica Chang, Thomas Erlebach, Renars Gailis, an Samir Khuller. Broacast scheuling: algorithms an complexity. In SODA 08: Proceeings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 473 482, 2008. [11] Chanra Chekuri, Avigor Gal, Sungjin Im, Samir Khuller, Jian Li, Richar Matthew McCutchen, Benjamin Moseley, an Louiqa Raschi. New moels an algorithms for throughput maximization in broacast scheuling - (extene abstract). In WAOA 10: Proceeings of the Eighth Workshop on Approximation an Online Algorithms, pages 71 82. Springer, 2010. [12] Chanra Chekuri, Sungjin Im, an Benjamin Moseley. Online scheuling to minimize maximum response time an maximum elay factor. Theory of Computing, 8(7):165 195, 2012. [13] Jeff Emons, Donal D. Chinn, Tim Brecht, an Xiaotie Deng. Non-clairvoyant multiprocessor scheuling of jobs with changing execution characteristics. J. Scheuling, 6(3):231 250, 2003. [14] Jeff Emons, Sungjin Im, an Benjamin Moseley. Online scalable scheuling for the l k -norms of flow time without conservation of work. In SODA 11: Proceeings of the Twenty-first Annual ACM -SIAM Symposium on Discrete Algorithms, pages 109 119, 2011. 12

[15] Jeff Emons an Kirk Pruhs. Scalably scheuling processes with arbitrary speeup curves. In SODA 09: Proceeings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms, pages 685 692, 2009. [16] Regant Y. S. Hung an Hing-Fung Ting. Design an analysis of online batching systems. Algorithmica, 57(2):217 231, 2010. [17] Sungjin Im an Benjamin Moseley. An online scalable algorithm for average flow time in broacast scheuling. In SODA 10: Proceeings of the Twentieth Annual ACM -SIAM Symposium on Discrete Algorithms, 2010. [18] Sungjin Im, Benjamin Moseley, an Kirk Pruhs. A tutorial on amortize local competitiveness in online scheuling. SIGACT News, 42(2):83 97, 2011. [19] Bala Kalyanasunaram an Kirk Pruhs. Spee is as powerful as clairvoyance. J. ACM, 47(4):617 643, 2000. [20] Bala Kalyanasunaram, Kirk Pruhs, an Mahenran Velauthapillai. Scheuling broacasts in wireless networks. Journal of Scheuling, 4(6):339 354, 2000. [21] Jae-Hoon Kim an Kyung-Yong Chwa. Scheuling broacasts with ealines. Theor. Comput. Sci., 325(3):479 488, 2004. [22] M. Mathirajan an A.I. Sivakumar. A literature review, classification an simple meta-analysis on scheuling of batch processors in semiconuctor. The International Journal of Avance Manufacturing Technology, 29(9-10):990 1001, 2006. [23] Kirk Pruhs, Julien Robert, an Nicolas Schabanel. Minimizing maximum flowtime of jobs with arbitrary parallelizability. In WAOA 10: Proceeings of the Eighth Workshop on Approximation an Online Algorithms, pages 237 248, 2010. [24] J. Wong. Broacast elivery. Proceeings of the IEEE, 76(12):1566 1577, 1988. 13

A Analysis for the Online Conversion to an Integral Scheule for Varying Size Pages In this section, we analyze the online conversion algorithm we presente in Section 4.3, an prove the following theorem. Theorem A.1. In the scheule A, any request J p,i is satisfie by time r p,i + 10 ɛ ρ p,i. For the sake of contraiction, suppose that there exists a request J q,k that waits longer than 10 ɛ ρ q,k time in A to be satisfie. Let t be the first time that J q,k has waite more than 10 ɛ ρ q,k time steps. Let t 1 be the earliest time before time t such that every request that forces A to scheule uring [t 1, t ] has lag at most ρ q,k. Let N p be the set of requests that force A to scheule page p uring [t 1, t]. Let S p be the set of start times for page p which occur uring [t 1, t ]. Notice that N p = S p. It is important to note that not every request which forces A to process a page uring [t 1, t ] has a start time uring this interval. In particular, some requests can start being satisfie in A s scheule before they are complete in A s scheule. Recall that they cannot force A to scheule until they are complete by A. It is this set of requests that most of the proof will nee to focus on. Let N p be the set of requests that force A to scheule uring [t 1, t ] whose start time is before t 1, an let S p contain this start time. Note that S p particularly contains the last time a request in N p was starte before t 1. Our proof can be summarize as follows. We will first show that the total volume of work A oes uring [t 1, t ] is (1 + 4ɛ)(t t 1 ) p S p l p + p S p l p. We will then show that the total amount of work A must o uring [t 1 ρ k, t ] is p S p l p, an p S p l p 8ρ k. Given that A has more spee than A this will be sufficient to erive a contraiction to J q,k s large waiting time. Proposition A.2. Consier any start time s in S p S p, an count the total number of pieces of page p that are scheule uring [t 1, t ] when a request with the final start time s forces A to broacast. Then the total number is at most l p. Proof. This proposition easily follows from efinition of A. All requests for page p having the same (final) start time s are starte (or restarte) at the same time s, an continue to be satisfie altogether when one of those requests force A to broacast a piece of page p. In other wors, the piece is right one for all those requests. Lemma A.3. Any request that forces A to scheule uring [t 1, t ] arrives at earliest t 1 ρ q,k. Proof. For the sake of contraiction say that there is a request J p,i that arrive before t 1 ρ q,k an force A to scheule uring [t 1, t ]. By efinition of A, J p,i is available to be scheule in A at time r p,i + ρ p,i r p,i + ρ q,k < t 1. However, this implies that every request scheule uring [r p,i + ρ p,i, t 1 ] has lag less than ρ p,i ρ q,k. This contraicts the efinition of t 1. Lemma A.4. Any two istinct requests J p,i an J p,j in N p cannot be satisfie at any point simultaneously in A. Proof. Fix some page p an consier any two istinct requests J p,i, J p,j N p. Without loss of generality, assume that the first time J p,i forces A to scheule p is before the first time J p,j forces A to scheule. Let t be the first time A is force to be scheule by J p,i. There are two cases. In the first case, say that r p,j t. Say that J p,i is in Bp i. In this case, the only reason J p,j is not starte at time t is that J p,j / B i 1 p Bp i B i +1 p. However this implies that either j < i B p + 1 or j > i + B p 1. That is, J p,j is not inclue in the batch with J p,i at any point in A s scheule. Now consier the secon case that r p,j > t. We know that t r p,i + ρ p,i since no request can force A to scheule until it is complete in A. However, we know that A completes J p,i at time r p,i + ρ p,i. This implies that J p,i is complete in A before J p,j arrives. Hence the claim follows. 14