SEVERAL applications over the Internet involve processing

Size: px
Start display at page:

Download "SEVERAL applications over the Internet involve processing"

Transcription

1 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER Load Balancing in a Cluster-Based Web Server for Multimedia Applications Jiani Guo, Student Member, IEEE, and Laxmi Narayan Bhuyan, Fellow, IEEE Abstract We consider a cluster-based multimedia Web server that dynamically generates video units to satisfy the bit rate and bandwidth requirements of a variety of clients. The media server partitions the job into several tasks and schedules them on the backend computing nodes for processing. For stream-based applications, the main design criteria of the scheduling are to minimize the total processing time and maintain the order of media units for each outgoing stream. In this paper, we first design, implement, and evaluate three scheduling algorithms, First Fit (FF), Stream-based Mapping (SM), and Adaptive Load Sharing (ALS), for multimedia transcoding in a cluster environment. We determined that it is necessary to predict the CPU load for each multimedia task and schedule them accordingly due to the variability of the individual jobs/tasks. We, therefore, propose an online prediction algorithm that can dynamically predict the processing time per individual task (media unit). We then propose two new load scheduling algorithms, namely, Prediction-based Least Load First (P-LLF) and Prediction-based Adaptive Partitioning (P-AP), which can use prediction to improve the performance. The performance of the system is evaluated in terms of system throughput, out-of-order rate of outgoing media streams, and load balancing overhead through real measurements using a cluster of computers. The performance of the new load balancing algorithms is compared with all other load balancing schemes to show that P-AP greatly reduces the delay jitter and achieves high throughput for a variety of workloads in a heterogeneous cluster. It strikes a good balance between the throughput and output order of the processed media units. Index Terms Online prediction, partial predictor, global predictor, adaptive partioning, prediction-based load balancing, out-of-order rate. Ç. J. Guo is with Cisco Systems, Inc., 170 West Tasman Dr., San Jose, CA L.N. Bhuyan is with the Department of Computer Science and Engineering, University of California, Riverside, CA Manuscript received 16 Apr. 2004; revised 23 Feb. 2005; accepted 22 Dec. 2005; published online 26 Sept Recommended for acceptance by J. Srivastava. For information on obtaining reprints of this article, please send to: and reference IEEECS Log Number TPDS INTRODUCTION SEVERAL applications over the Internet involve processing of secure, computation-intensive, multimedia, and highbandwidth information. Many of these applications require large-scale scientific computing and high-bandwidth transmission at the server nodes. The current generation of Internet servers is mostly based on either a general-purpose symmetric multiprocessor or a cluster-based homogeneous architecture. As we attempt to scale such servers to high levels of performance, availability, and flexibility, the need for more sophisticated software architectures becomes obvious. Additionally, contemporary distributed architectures have limited abilities to handle overloads, load imbalances, and compute-intensive transactions like cryptographic applications and multimedia processing. In this paper, we consider a scalable distributed system architecture, shown in Fig. 1, where the major functionalities required in the Internet servers (SSL, HTTP, script and cryptographic processing, database management, multimedia processing, etc.) are partitioned into parallel tasks and backend computing servers are allocated based on their needs. In this paper, we consider multimedia processing as the example. Since Internet clients may vary greatly in their hardware resources, software sophistication, and quality of connectivity, different clients require different media streaming service. A promising solution to the problem is to use transcoding to customize the size of objects and distribute the available network bandwidth among various clients [1]. This method is called on-demand transcoding, which is used to convert a multimedia object from one form to another. Ondemand transcoding (distillation) has been proposed to transform media streams in the active routers [2], [3], [4] or proxy servers [5], [6], [7] to adapt media streams to fluctuating network conditions. Any client intending to request a media stream first contacts the media server, as shown in Fig. 1. If the media data in storage satisfies the requirements, the media server supplies the data. If on-demand transcoding is needed, the media server retrieves data, divides them into several tasks, and distributes the tasks among computing servers for transcoding. The transcoded data is transferred back to the media server and then delivered to the clients. Due to the variety of clients, different streams may require different transcoding operations and, therefore, produce a variety of individual transcoding jobs to be scheduled among the computing servers. In order to provide real-time transcoding service, a good scheduling algorithm needs to be employed that can predict the CPU load for each job and then schedule accordingly. Load balancing is a critical issue in parallel and distributed systems to ensure fast processing and good utilization. A detailed survey of general load balancing algorithms is provided in [8]. Although a plethora of loadbalancing schemes have been proposed, simple static policies, such as random distribution policy [9] or modulus-based round-robin policy [10], are adopted in practice because they are easy to implement. However, these schemes do not work well for heterogeneous processors or variation in the task processing times. On the other hand, adaptive load balancing policies are usually complicated /06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society

2 2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 Fig. 1. Cluster-based Web server architecture. and require prediction of computation time for any incoming requests [11]. They are difficult to implement, and produce increased communication overhead due to feedback requirements from the processors. Moreover, the load balancing techniques proposed for general parallel systems cannot be directly applied to our media cluster because there are additional requirements like reducing jitter. Jitter is defined as the standard deviation of the interdeparture time among media units. High jitter is detrimental to the playback quality, which is the main concern of media clients. The interdeparture time among units of a multimedia stream is reduced through parallel transcoding on the computing servers in the cluster. That gives rise to an increase in the out-of-order departure of the packets, thus producing high jitter. Hence, proper scheduling with a good load balancing algorithm must be designed to deliver a suitable balance between high throughput and low jitter. A few researchers have developed load scheduling algorithms for cluster-based Web servers. Zhu et al. proposed an elegant scheduling algorithm to provide differentiated service to multiple service classes of generic Web requests [12]. Li et al. [13] implemented a generalized Web request distribution system called Gage. A Least Load First (LLF) policy is employed, where the server with the least load is chosen to process a request. However, their work focuses on generic Web requests instead of the multimedia jobs considered in this paper. We have designed and proposed a set of scheduling algorithms for parallel multimedia applications. In order to develop and evaluate the proposed load balancing schemes, we implement a Linux-based media cluster over Gigabit Ethernet and develop a multithreaded software architecture to schedule multimedia jobs for transcoding in the cluster. A multithreaded software architecture can overlap communication with computation and can achieve maximum efficiency. We make the following contributions in this paper: 1. We implement a media cluster and do experiments to compare the performance of three load balancing algorithms, namely, First Fit (FF), Stream-based Mapping (SM), and Adaptive Load Sharing (ALS). The SM and ALS schemes were designed by us [14], and are described in detail in Section 2. We also present more results in this paper for a number of movies with different transcoding requirements. 2. The above algorithms are based on the average computational requirement of a multimedia unit. Since the computation may vary from time to time, we design a prediction algorithm in Section 3 to dynamically predict the transcoding time for each media unit. The predicted time is used to distribute the incoming workload to computing servers accordingly. 3. Incorporating the prediction scheme, we propose three new load balancing algorithms in Section 4, namely, Prediction-based Least Load First (P-LLF), Adaptive Partitioning (AP), and Prediction-based Adaptive Partitioning (P-AP). P-LLF extends the LLF algorithm using prediction. Adaptive Partitioning (AP) is a new algorithm that reduces jitter by dynamically mapping each stream to a subset of servers. P-AP extends AP by employing prediction to compute the requirement. 4. We do experiments to compare the performance of an above seven load balancing algorithm, namely, FF, SM, ALS, LLF, P-LLF, AP, and P-AP. Experimental results are presented in Section 5 to show that the prediction-based algorithms produce better throughput and less out-of-order departure for a number of media streams with different requirements. 2 NONPREDICTION-BASED LOAD BALANCING TECHNIQUES In this section, we present three nonprediction-based load balancing algorithms that we have developed for a media cluster. Preliminary results applying only one transcoding operation were presented earlier by us in [14]. We extend the algorithms to different transcoding operations. 2.1 First Fit (FF) With First Fit, the media server searches for an available computing server in a round-robin way when scheduling media units. It always chooses the first available one to dispatch a media unit. To avoid collecting feedback from servers, we build a dispatch queue on the media server for each computing server and let these dispatch queues be indicators of their load status. To schedule a media unit, the dispatch queues are polled in a round-robin way. The unit is scheduled to the first computing server whose corresponding queue has a vacancy. If all queues are full, overload is indicated on all servers and the unit will not be scheduled until one of the queues is drained. The load on a server is either affected by the complexity of the transcoding operations or the sizes of the units. The server with the higher load usually drains its dispatch queue slower and leaves fewer vacancies in the queue. Consequently, heavily loaded servers are more likely to get fewer media units for processing than the lightly loaded servers. Therefore, the loads among servers are naturally balanced to some extent. It is a nice way to take into account heterogeneous servers without the help of an extra load analyzer. But, the media units of the same stream are most likely distributed to different servers, resulting in high delay jitters for each stream at its destination. However, as shown in [14], FF

3 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 3 generates higher throughput than the simple round-robin scheme presented in [3]. 2.2 Stream-Based Mapping (SM) The problem with FF is that media units are distributed to many servers which causes large out-of-order delivery of the units. To preserve the computation order among media units, as well as to keep the simplicity of FF, a stream-based mapping algorithm can be employed. The unit is mapped to a server according to the function fðcþ ¼c mod N, where c is the stream number to which the unit belongs and N is the total number of servers in the cluster. Therefore, all the units belonging to one stream will be sent to the same server. We have shown in [14] that this scheme works most efficiently in a cluster with homogeneous servers and for some specific input patterns. Assuming there are M streams and N servers, input workload must satisfy the condition M N and M is multiple of N. 2.3 Adaptive Load Sharing (ALS) Policy A number of adaptive load sharing policies are proposed in the literature [8]. However, we are unaware of any real implementation because of the complexity and overhead of the ALS algorithms. Our analysis indicated that the extended HRW technique [15], proposed for network applications, offers a reasonable balance between the throughput and out-of-order departures. While aiming at delivering high throughput per flow, the ALS policy also minimizes the probability of units belonging to the same flow being treated by different processors and so minimizes the out-of-order rate. Hence, we implemented it to schedule multiple media streams in our system, as described below. According to the ALS policy, a media unit can be mapped to a particular server according to the function fð~vþ ¼j, which is defined as x j gð~v; jþ ¼max k2f1;...;ng x k gð~v; kþ; where v is the identifier vector of the unit which helps identify a particular flow the unit belongs to, j is the server node to which the unit will be mapped for processing, gð~v; jþ is a pseudorandom function which produces random variables in (0,1) with uniform distribution, and ðx 1 ;x 2 ;...x N Þ is a weight vector that describes the processor utilization for each server. The weight vector, ðx 1 ;x 2 ;...x N Þ, is dynamically adapted according to the system behavior through periodic feedback. Here is how the adaptation works. The media server periodically gathers information from each server about its utilization and calculates to see if the adaptation threshold is exceeded. If the threshold is exceeded, the media server adjusts the weights. In the feedback report, a smoothed, low-pass filtered processor utilization measure of the following form is used to calculate the utilization of each server j ðtþ by gathering the load statistics information j ðtþ periodically: j ðtþ ¼ 1 r jðtþþ r 1 j ðt tþ: ð2þ r Similarly, the total system utilization is measured as ðtþ ¼ 1 r ðtþþr 1 r ðt tþ. The adaptation algorithm consists of triggering policy and adaptation policy. Once the ð1þ triggering condition is reached, adaptation will be taken to the weights of involved servers. To implement the ALS algorithm, there are two issues left open to implementers. One is the pseudorandom function gð~v; jþ. Kencl an Boudec [15] suggest to implement function gð~v; jþ using the hash function h 1ðyÞ ¼ð 1 yþ mod 1, which isp ffiffiffi based on the Fibonacci golden ratio multiplier 1 ¼ ð 5 1Þ=2, such that gð~v; jþ ¼h 1 ~v XOR h 1ðjÞ : ð3þ The other open issue is how to measure the load of each processor. In our experiments, we adopt the above function gð~v; jþ, and define the load indicator j ðtþ as j ðtþ ¼ttask j =t; where ttask j is the CPU time spent by the transcoding services during the polling interval t. ðtþ is defined as! ðtþ ¼ XN ttask j =Nt ¼ 1 X N j ðtþ: ð5þ N j¼1 The identifier v is chosen to be the stream number of the media unit. Therefore, during each monitoring epoch, the mapping function (1) is calculated and a static mapping between the streams and the servers is determined. When a change in load distribution is reported by the computing servers at the end of an epoch, the weight vector is changed and the mapping is adjusted to rebalance the loads among servers. The new mapping takes effect in the next epoch. We have shown in [14] that ALS reduces departure jitter for multiple streams. In spite of high overheads to collect feedback information, the ALS scheme produces a good throughput. More results and comparisons are given in Section 5 of this paper. 3 PREDICTING PROCESSING TIME In the load balancing algorithms presented so far, the variability of individual jobs has not been explicitly considered. As to the fact that the transcoding time of media units in a stream or among streams varies a lot due to the wide variation in scenes and motion relations, the ability to predict how much CPU load a job may consume is essential for building a good scheduling scheme. In the cluster-based Web server system Gage [13], the CPU time consumed by client requests is predicted as the weighted average time of the processed requests. The prediction is used to predict the load on each server and to distribute the incoming workload among servers according to the Least Load First (LLF) policy. However, such a simple prediction scheme is not suitable for multimedia transcoding because the transcoding time differs among transcoding operations even for the same stream. During the past decade, a number of video-coding standards have been developed for communicating video data. These standards include MPEG-1 for playback from CD-ROM, MPEG-2 for DVD and satellite broadcast, and H.261/263 for videophone and video conferencing. Newer video coding standards, such as MPEG-4, also emerged. For all these video-coding standards, four video resolution criteria are usually used in commercial products, as j¼1 ð4þ

4 4 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 TABLE 1 Four Resolution Criteria in MPEG Specification TABLE 2 MPEG Movies for Transcoding illustrated in Table 1. Given these video resolution criteria, the common transcoding operations fall into three types: changing the bitrate, resizing frames, and changing the frame rate among the four resolution criteria. Most data streaming formats contain periodic zero-state resynchronization points for increased error resilience, effectively segmenting the stream into independent blocks, which we call media units [4]. For instance, in an MPEG-1/2 stream, a media unit can be a group of pictures (GOP) that is decoded independently. Since most transformations maintain the independence of media units, the transformation of a single media unit can be considered an independent processing job which can be scheduled onto any computing server in the cluster. The only interjob dependence is the processing order of consecutive media units in the same media stream. Bavier et al. [16] built a model to predict the MPEG decoding time at the frame level. They found that it is possible to construct a linear model for MPEG decoding with R 2 values of 0.97 considering both frame type and size. In statistics, the correlation coefficient R indicates the extent to which the pairs of numbers for two variables lie on a straight line. The strength of the relationship between X and Y is theoretically expressed by squaring the correlation coefficient R and multiplying by 100, which is known as variance explained R 2. For example, a correlation R of 0.9 means R 2 ¼ 0: ¼ 81%. Hence, 81 percent of the variance in Y is explained or predicted by the X variable. Experimental results [16] show that the model can be used to predict the execution time to within 25 percent of the time actually taken to decode frames. In this section, we first model the relationship between the transcode time and the unit size by statistically analyzing a set of experimental results for a specific movie and a specific transcoding operation. Based on the model, we develop a prediction algorithm to dynamically predict the transcode time of a media unit. 3.1 Modeling the Relationship between the Transcode Time and the GOP Size Transcoding of an MPEG GOP does not consume a constant amount of processing, due in part to the fact that a GOP is composed of different frame types and in part to the wide variation between scenes and different motion relations among frames. Each GOP has three kinds of frames: I frames (intraframe), P frames (predictive frame), and B frames (bidirectional predictive frame). I frames are selfcontained, complete images. P and B frames are encoded as differences from other reference frames. Due to different motion relations existing among frames, a GOP may contain different number of I, P, and B frames. This makes prediction of the time to transcode the next GOP based on past behavior difficult. We do a set of experiments to observe the transcoding time of two different movies as illustrated in Table 2. For each movie, three operations are performed: changing the bit rate, reducing the frame rate, and resizing the frame. Fig. 2a plots the transcode time as a function of GOP size when the bit rate of Lord of the Rings is reduced to 50kbps. The straight line in the figure is the linear regression line, obtained by statistically analyzing the data set. Fig. 3a demonstrates the scatterplot of the transcode time as a function of GOP size for the movie The Matrix. For each movie, we also plotted the transcode time for the other two transcoding operations: changing the frame rate and resizing frames. However, due to the paper length limit, we do not give the scatterplots here. For both movies and the three transcoding operations, the regression equations for the tentative linear regression analysis are given in Table 3. In the table, the GOP size is measured in terms of KBs and the transcode time is measured in terms of milliseconds. From the R 2 values, we notice that a linear model cannot adequately describe the relationship between the transcode time and the GOP size. However, as the scatterplot in Fig. 2a suggests, the transcode time does Fig. 2. Transcode time versus GOP size (movie: Lord of the Rings; operation: reducing bit rate to 50 kbps). (a) Tentative linear regression modeling. (b) Linear regression modeling using means.

5 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 5 Fig. 3. Transcode time versus GOP size (movie: The Matrix; operation: reducing bit rate to 50 kbps). (a) Tentative linear regression modeling. (b) Linear regression modeling using means. increase as the GOP size increases. The difficulty in fitting it into a linear model is caused by the wide variation of the transcode time for a given GOP size. Thus, for ease of analysis, we divide the GOP size into regions, as shown in Table 4. For each region, the average of the GOP transcoding time is calculated, as shown in Fig. 2b, where each point in the figure has the regional mean value of the GOP size as its x value and the regional mean value of the transcode time as its y value. For the scatterplot drawn in Fig. 2b, a linear regression line with R 2 value as high as 99 percent is obtained. The corresponding linear equation is given in Table 5. In Table 5, the GOP size is in terms of KBs and the transcode time is in terms of milliseconds. Therefore, we have obtained a good linear model to describe the relationship between the transcode time and the GOP size considering a specific movie and a specific transcoding operation. 3.2 Predicting Execution Time on a Single PC Based on the linear model built in the previous section, we can estimate the transcode time of the GOPs processed so far and incrementally build a predictor to predict the execution time for the next GOP. As presented by Bavier et al. [16], building a linear predictor based on the canonical least squares algorithm would be computationally too expensive for the scheduling purpose. They designed a predictor that approximates the linear model. Through experimental results, they also verified that the predictor works even better than the linear model. In this paper, we adopt the same method as theirs to build a predictor that approximates the linear model presented in Section 3.1. However, because our model differs from theirs in that the regional means are used, our predictor is built differently from their predictor. Table 6 defines all the parameters used to predict the transcode time. The prediction is carried out in two separate steps. One is to incrementally build the predictor based on the behavior of the GOPs processed so far. The other is to predict the transcode time for a given GOP. The P redictor is initialized as ð0; Default; 0; 0; 0Þ. Once a GOP is processed, the GOP size and its transcode time, ðsize; timeþ, are recorded and the P redictor is updated TABLE 3 Tentative Linear Regression Modeling TABLE 4 Regions of GOP Size in Terms of Bytes TABLE 5 Linear Regression Modeling Using Regional Means

6 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 TABLE 6 Parameters Used in Prediction accordingly. The basic idea is that the linear slope, ðxdiff; ydiffþ, is incrementally approximated according to the difference between the accumulated regional means, ðmregt ime½iš; mregsize½išþ, and the accumulated global means, ðmtime; msizeþ, after enough units have been processed. The procedure can be described step by step as follows: First, the region to which the GOP belongs to is found out to be i, and ðmregt ime½iš; mregsize½išþ is updated. Then, ðxdiff; ydiffþ is updated only when two conditions are both met. One is that enough samples have been accumulated, i.e., DistinctRegs EnoughRegs. The other is that ðmregt ime½iš mtimeþ shows the same increasing or decreasing tendency as that of ðmregsize½iš msizeþ. Finally, msize, mtime, samples, and DistinctRegs are updated to count the newly processed unit into the accumulated value. Fig. 13 illustrates this procedure. The transcode time of a GOP of size size is predicted according to the Predictor as prediction ¼ 8 Default samples ¼ 0 >< mtime samples > 0 mtime þðsize msizeþ >: ydiff=xdiff samples > 0; xdiff > 0: Fig. 14 illustrates the prediction algorithm. 3.3 Predicting Execution Time on a Set of Heterogeneous PCs To process a stream in parallel on a set of heterogeneous machines, prediction becomes complicated. There emerges two new questions. First, how do we build a predictor when each server only processes part of the units that are distributed to it? Second, given a predictor of the stream, how do we predict the execution time of a given GOP on each specific server when the servers possess different power? We propose building the predictor as follows: Each server incrementally builds its own predictor, which we call partial predictor, based on the information of the GOPs it has processed so far. The scheduler periodically collects the partial predictors from all computing servers and combines them to be a global predictor. The partial predictors are constructed independently on each computing server according to the algorithm illustrated in Fig. 13. Table 7 defines the symbols used throughout the rest of the paper. Let there be N computing servers. When the scheduler collects the partial predictors, it generates the global predictor, as demonstrated in Fig. 4. First, ðmtime 1 ; mtime 2 ;...; mtime N Þ and ðydiff 1 ; ydiff 2 ;...; ydiff N Þ are normalized according to the weight vector ðw 1 ;w 2 ;w 3 ;...;w N Þ. Then, according to the sample size returned by each computing server, ðmsize g ; mtime g ; xdiff g ; ydiff g Þ is calculated as the arithmetic average of their corresponding values presented in ðpredictor 1 ; P redictor 2 ;...; Predictor N Þ. When the scheduler schedules media units among computing servers, the time to process a given GOP on server i is predicted based on the global predictor according TABLE 7 Definition of the Predictors

7 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 7 Fig. 4. Algorithm to generate the global predictor based on partial predictors. to the algorithm in Fig. 14. Due to the heterogeneity of the servers, we should also take into consideration the processing power of each server. Therefore, the prediction is performed as follows: prediction i ¼ getpredictionðsize; Predictor g Þ=w i ð6þ i ¼ 1; 2;...;N: If multiple streams are processed in the computing cluster, it is desirable to build the partial predictors and global predictor for each stream and perform the prediction streamwise. The reason is that, for each stream, the transcode time and unit size conforms to a specific linear relation. In the rest of the paper, a stream refers to a specific movie which requires a specific transcoding operation. 4 PREDICTION-BASED LOAD BALANCING TECHNIQUES With the prediction algorithm in place, we design prediction-based load balancing algorithms in this section. The heterogeneity of computing servers is described by the weight vector ðw 1 ;w 2 ;...;w N Þ defined in Table Least Load First (LLF) and Prediction-Based Least Load First (P-LLF) In the cluster-based Web server system Gage [13], a Least Load First (LLF) scheduling algorithm is employed to distribute client requests among servers. According to their policy, the LLF algorithm runs as follows: For each media stream, the execution time consumed by a media unit is predicted as the weighted average time of the processed units. This prediction is updated periodically by collecting information from computing servers. The prediction is used to predict the outstanding load on each computing server and to schedule media units such that a least loaded server is chosen to process a unit. We extend LLF with our prediction policy, proposed in Section 3.3, and propose Prediction-based Least Load First (P- LLF) algorithm. P-LLF, as described in Fig. 5, contains two parts. Let there be N computing servers and M streams. The scheduler maintains a load indicator L i for each server iði ¼ 1; 2;...;NÞ and a predictor Predictor j g for each stream jðj ¼ 1; 2;...;MÞ. The first part is the periodic adjustment of the load indicators and stream predictors. For each computing server, the load status is observed and the load indicator is updated. For each stream, the partial predictors are collected from computing servers and combined to generate the global predictor. The second part is scheduling units according to the load status and predictions. The predicted process time of the media unit is calculated according the global predictor. A server is chosen such that its load is the least after the unit is scheduled. Once a unit is scheduled to the chosen server, its predicted time is stamped and it is dispatched to the server. Each computing server records its current outstanding load, i.e., the total process time predicted for the units that are dispatched to it and waiting to be processed. Both LLF and P-LLF aim to distribute the workload among servers proportionally to their capacity and produce Fig. 5. Prediction-based Least Load First scheduling algorithm.

8 8 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 Fig. 7. A partitioning example. by the vector ðcomp 1 ðtþ; comp 2 ðtþ;...; comp M ðtþþ, where comp j ðtþ is the weighted average time accumulated for the stream. We normalize the computation complexity of each stream as r j ðtþ and let Fig. 6. Partitioning algorithm. high throughput, although the degree of load balancing may be different due to the accuracy of their prediction policies. In both schemes, the media units of the same stream are possibly distributed to different servers and, thus, may cause high jitters at the destination. 4.2 Adaptive Partition (AP) and Prediction-Based Adaptive Partition (P-AP) There are two goals when designing a load balancing scheme in the media cluster: 1) to balance the workload among servers to achieve the maximum throughput, and 2) to maintain the flow order for each outgoing stream while processing its media units on multiple servers. We have found that the first fit scheme [14] gives maximum throughput, but produces maximum jitter (or out-of-order departure) because the media units are distributed to all the servers. The adaptive load sharing (ALS) policy essentially reduces the jitter by doing an allocation of streams at every epoch. During an epoch, a stream is allocated to only one processor, but can be allocated to any processor at different epochs. However, this approach may cause occasional waste of resource and reduce the system throughput. To strike a balance between the throughput and the delay jitter, it may be better to send media units of the same stream to a limited set of servers in an epoch. Hence, we propose an Adaptive Partitioning (AP) algorithm that dynamically partitions the servers into several subsets and establishes mapping between the streams and the subsets. The partitioning and mapping is established according to both the observed computation requirements of different streams and the processing power of different servers. In other words, both the stream heterogeneity and server heterogeneity are taken into account. The LLF algorithm is adopted to schedule a stream among the mapped subset of servers. Different streams require different computational power for their specific transcoding operations. The media server records the computation complexity of each stream at time t r j ðtþ ¼comp j ðtþ= XM j¼1 comp j ðtþ: The weights of servers can be viewed as the capacity tokens that represent the workload completed in a unit time on a server. Hence, we define a token vector ðt 1 ;T 2 ;T 3 ;...;T N Þ, where T i ¼ w i, and the total token of the computing cluster as Total ¼ P N i¼1 T i. The total token is then partitioned into M subsets according to the computation requirements of streams, with each subset mapped to one stream. The total token held by subset i, i.e., by stream i, is defined as Stoken i ðtþ ð7þ Stoken i ðtþ ¼ r i ðtþtotal i ¼ 1; 2;...;M: ð8þ Obviously, ðstoken 1 ðtþ; Stoken 2 ðtþ;...; Stoken M ðtþþ represent M fractions of the total token held by all the servers. Each fraction corresponds to a subset which contains at least one server. In this way, the tokens of N servers are distributed among M streams. One server may be assigned to several subsets, which means it is shared among several streams. Also, one subset may contain several servers, i.e., one stream can be processed on several servers simultaneously. If a stream is mapped to multiple servers, say kðk 1Þ servers. The media server schedules the stream among the k servers according to the LLF algorithm described in Section 4.1. If a stream is mapped to only one server, all the media units of the stream are scheduled to be processed on this server. Fig. 6 describes the partitioning algorithm, which partitions the N servers into M subsets according the servers processing power and the streams computation requirements. The partitioning is expressed as ðsubset 1 ; subset 2 ;...; subset M Þ. subset j is a set of servers. Repartitioning is performed only when any r j ðtþ varies by more than a tolerable percentage, say. In summary, the AP algorithm works as follows: The new computation requirements of streams determines if repartitioning is needed. If it is needed, mapping between streams and subsets of servers is reestablished. Fig. 7 shows a partitioning example of three servers and four streams. The servers have different capacities. The T i s

9 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 9 TABLE 8 Mapping Streams to Subsets held by servers and the Stoken i ðtþs held by streams are shown in the figure, the total token held by the servers is 6. The mapping established between streams and subsets is demonstrated in Table 8. According to Fig. 7 and Table 8, stream 1 is mapped to two servers, server1 and server2. Its media units are scheduled among these two servers according to P-LLF algorithm. Streams 3 and 4 share server 3, and streams 1 and 2 share server 2. We also extend AP to be P-AP by employing the prediction policy proposed in Section 3. With P-AP, the computation complexity of each stream, comp j ðtþ, equals mtime j gðtþ proposed in Section 3. When a stream is mapped to several computing servers, P-LLF is used to choose a server among several. 5 EXPERIMENTAL EVALUATION 5.1 Experimental Settings Table 9 describes the hardware and software configurations of the Media Server and Computing Servers, implemented in our laboratory. For the streaming service, four movies are currently used, namely, Lord of the Rings, The Matrix, Peterpan, and Resident Evil. To satisfy the clients requirements, three kinds of transcoding operations are performed, namely, changing the bit rate, resizing the frames, and changing the frame rate. To process a stream in the media cluster, one of the three transcoding operations is performed on one of the movies. The transcoding service provided by each server is derived from a powerful multimedia processing tool called FFMPEG [17]. Fig. 8 demonstrates the software framework of our media cluster. We implement a multithreaded architecture in order to overlap computation and communication. On the media server, four kinds of threads, namely, retriever, scheduler, dispatcher, and manager, are running concurrently. Retriever continuously retrieves media units from the disk and stores them in the unit buffer, which adopts FIFO policy. A dispatch queue is maintained for each server which holds all the media units that have been scheduled to the server. The scheduler fetches units from the unit buffer and puts them into a dispatch queue according to the load balancing policy discussed in Section 4. Upon the request of a server, the dispatcher gets a media unit from the corresponding dispatch queue and transmits it to the server. The manager periodically collects information from the servers and feeds the information to the scheduler. On the server node, four threads, receiver, transcoder, sender, and monitor, are running concurrently. The receiver receives packets from the Manager and assembles them into complete media units. Once a complete media unit is ready, the transcoder transcodes the media unit. After transcoding, the sender delivers the media unit to the client. Once the receiver gives the media unit to the transcoder for processing, it requests another media unit from the media server by sending a Ready message. The monitor collects information on the server and reports it to the Manager periodically. 5.2 Performance Metrics Sensitivity to the above design parameters and efficiency of our media cluster are measured with respect to the following performance metrics. TABLE 9 Hardware and Software Configuration of Media Server Cluster Fig. 8. Software framework of the media cluster.

10 10 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 TABLE 10 Experimental Setting System Scalability A parallel system is desired to be scalable. As the cluster size increases, more and more media units should be processed by the system in a unit of time. Hence, we measure the scalability of our cluster in terms of system throughput, which is defined as GOPs/sec when comparing different load balancing schemes Load Sharing Overhead When using prediction-based schemes or the feedbackbased scheme ALS, the system throughput may degrade due to the overhead caused by collecting information from computing servers and adapting to load imbalance. We define the Load Sharing Overhead as the average time consumed by the media server to poll through all servers to collect information Video Quality As a special requirement imposed by multimedia streaming on our parallel servers, the video quality observed at the receiver side is a very important metric. To observe how the transcoded units are delivered from computing servers to the media server and then go to the client, we write a program called Departure-Recorder and run it on the media server. Departure-Recorder receives the transcoded units from the computing servers and records the time to receive each unit without extra reordering. Based on this information, we can evaluate the traffic pattern of the departing streams so as to predict the video quality at the client side. To describe the traffic pattern of outgoing streams, we define three metrics as follows: Metric a: Departure Jitter per Stream is the standard deviation of the interdeparture time among GOPs when the stream departs the media server. It depicts and predicts how smooth a stream may be played out at the client side in real time. Metric b: Average Interdeparture Time among GOPs per Stream is the mean of the interdeparture times among GOPs when the stream departs from the media server. Metric c: Out-of-Order Rate per Stream describes how many GOPs among all the GOPs in a stream depart out of order. Fig. 9. Scalability of the system throughput. 5.3 Evaluation of Results System Throughput Scalability of the system throughput is one of the most important metrics that we need to examine when comparing different load balancing schemes. Since the throughput is highly affected by the input workload, we generate a large enough number of streams such that the Media Unit Buffer never becomes empty. Thus, we can measure the performance of different load balancing schemes in a fully loaded system. Table 10 illustrates the detailed experimental settings. The load test epoch for feedback-based schemes is 2 seconds. Fig. 9 demonstrates the scalability of system throughput with increasing cluster size. FF achieves the best scalability because it does not collect feedback from servers. Surprisingly, although P-LLF has high load sharing overhead compared to FF, its throughput is only slightly affected and closely approaches FF. The reason is that the throughput of FF is implicitly affected by the inaccuracy of observing load status through dispatch queues and blindness to stream heterogeneity when scheduling units. Instead, P-LLF explicitly tests the load status and considers the stream heterogeneity for scheduling. Therefore, with P-LLF, the load sharing overhead is counteracted by the better load balancing achieved among servers. Compared to P-LLF, P-AP performs the extra operation of repartitioning servers and remapping streams to servers. But, at the same time, P-AP performs less computation than P-LLF when scheduling units because the least loaded server is chosen within a small subset instead within the whole cluster. Hence, the throughput of P-AP still approaches that of P-LLF. LLF and AP both perform worse than their counterparts P-LLF and P-AP due to the inefficiency of their simplistic prediction method. On the other hand, SM and ALS have much lower throughput than other schemes for to two reasons. First, it is due to the potential load imbalance incurred by maintaining the flow consistency. Second, they schedule streams without considering the heterogeneity among streams. SM avoids dispersing media units of the same stream among different servers even if a server is free. This causes waste of resources, occasional imbalance in load distribution, and reduces the throughput. ALS involves high load sharing overhead and does not take into account the stream heterogeneity. Besides, the HRW function works better for a very large number of homogeneous streams. Therefore, ALS presents less scalability than others in the experiments Load Sharing Overhead Table 11 illustrates the load sharing overhead in terms of milliseconds for the five schemes, P-LLF, P-AP, LLF, AP, and ALS. Because P-LLF and P-AP share the same prediction scheme and both collect stream predictors at the end of each epoch, they have the same load sharing overhead. Similarly, LLF and AP share the same prediction scheme where the accumulated average transcoding times

11 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 11 TABLE 11 Load Sharing Overheads per stream are collected in each epoch. The load test overhead of the ALS scheme is different from the prediction-based schemes because only the CPU utilization information needs to be collected from servers. As shown in the table, the load test overhead increases almost linearly with the cluster size for all the five schemes because the communication expense increases linearly with the number of servers in the cluster. In addition, ALS incurs less overhead than the prediction-based schemes because its overhead is independent of the number of streams. As the cluster size increases, the prediction-based schemes incur higher overhead than ALS due to the increased number of streams. Even then we observe through the experimental results that prediction does lead to higher system throughput. AP/LLF has less overhead than P-AP/P-LLF because the message size transmitted per stream is smaller than that of P-AP/P-LLF Video Quality The traffic pattern of outgoing streams observed on the media server reflects the user-perceived video quality at the receiver side. We measure the video quality in terms of interdeparture time among GOPs, departure jitter, and Out-oF- Order (OFO) departure rate. The experimental settings are the same as that for testing the system throughput. As shown in Fig. 10, FF, LLF, and P-LLF incur the largest OFO departure rate since they schedule the media units freely without maintaining flow order. LLF and P-LLF perform similarly, with P-LLF doing marginally better because of a better prediction algorithm. FF does a little worse than LLF/ P-LLF because of the delay in observing overload on servers through the dispatch queues. With the flow concept in mind, SM and ALS incur small OFO departure. It is interesting to note that AP/P-AP greatly reduces OFO departure compared to P-LLF and FF and incurs only marginally higher OFO departure than SM and ALS. Given that P-AP also produces very good throughput, as shown in Fig. 9, it is a very promising scheduling scheme to achieve both high throughput and low OFO departure rate. Fig. 11 illustrates the departure jitter per stream for all load sharing schemes. It shows the same tendency in variation of OFO departure rate. The best departure jitter is achieved by SM because it processes all units of the same stream on the same server, thus guaranteeing in-order departure and produces the smallest jitter. ALS maintains the flow order at higher computation and communication overheads, thus incurring slightly larger jitter than SM. AP and P-AP map each stream to a limited set of servers, hence they reduce outof-order processing of consecutive units belonging to the same stream and, so, reducing jitter as well as ALS. Fig. 12 demonstrates the average interdeparture time among GOPs per stream. Since we have multiple streams in the system, as shown in Table 10, the interdeparture time per stream is not simply the inverse of system throughput. But, it still shows a similar scalability as that of the system Fig. 11. Departure jitter. Fig. 10. Out-of-order departure rate. Fig. 12. Interdeparture time.

12 12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 Fig. 14. Algorithm to predict the transcode time for a given GOP. Fig. 13. Algorithm to dynamically build the GOP predictor based on processed GOPs. throughput. FF achieves the best performance because it has no overhead. As we can see, P-LLF closely approaches FF. P-AP achieves a similar effect as that of P-LLF, both better than LLF and AP because of better load balancing led by better prediction. SM and ALS fail to give good performance because of their blindness to the stream heterogeneity and the overhead to maintain flow order. 6 RELATED WORK The transmission of multimedia information through networks has long been a research topic, and it is claimed that multimedia application is becoming one of the killer applications in this century. Due to the receiver heterogeneity and dynamically varying network conditions, a multimedia stream should be transformed to meet different clients requirements. A traditional way to solve the above problem is to store multiple copies of the source stream on the media server and select a copy according to some initial negotiation with the client. Although the disks have gotten larger and cheaper, online transcoding service has become a widely adopted solution to provide various clients with the media source according to their requirements. Transcoding is a process that transforms a compressed video bit stream into different bit streams either by employing the same compression format with alternate parameters or by employing another format altogether. Many researchers [1], [2], [18], [19] have addressed how to customize the multimedia contents to match user preferences or the diversity of network conditions and display devices. Chandra et al. [1] used JPEG transcoding techniques to customize the size of objects constituting a Web page, thus allowing a Web server to dynamically allocate available bandwidth among different classes. Fox et al. proposed to dynamically distill the Web page content on active proxies when they are transmitted through the network [6], [5]. They also implement a cluster-based Web distillation proxy called TranSend [20]. However, the most computationally expensive task performed by TranSend is the distillation of images. Their scheduling schemes emphasized fault tolerance more than load balancing, and parallel processing of a single stream was not considered. Welling et al. proposed the concept of CLuster-based Active Router Architecture (CLARA) [3], where a computing cluster is attached to a dedicated router. Using CLARA, the multimedia transcoding tasks are paralleling processed in the computing cluster instead of on the router itself [3], [4]. This solution brings up the new problem of how to efficiently utilize the resources provided by the computing cluster to meet media streaming requirement. In the network domain, the goal of load balancing is complicated by the additional requirement of preserving the flow order. The random distribution cannot preserve packet order within a flow if per-flow information is not maintained. Modulus-based round-robin policy also has the drawback that all flows are remapped if the number of computing nodes is changed. There has been almost no research in developing load balancing algorithms with the concept of flows in mind. One related paper was published by Kencl and Boudec [15] who extended the HRW algorithm [21] with a feedback mechanism to do adaptive load balancing in network processors. This allows adjustment to the load distribution with minimum flow remapping [15] and copes with request identifier space locality. We find the paper interesting, but they only reported some theoretical and simulation results without any real implementation. Taking MPEG transcoding as the first application in their cluster-based active router architecture, Welling et al. adopted round-robin algorithm to dispatch media units for transcoding among the nodes [3]. However, no experimental results were provided for this. By designing and implementing an active router cluster supporting transcoding service, we were able to evaluate three load sharing schemes, namely, round robin, stream-based round-robin, and adaptive load sharing [14]. It was shown that roundrobin is simple and fast, but provides no guarantee to the playback quality of output streams because it causes out-oforder departure of processed media units. Adaptive load sharing scheme, proposed by Kencl and Boudec [15], achieves better unit order in output streams, but involves higher overhead to map the media unit to an appropriate node. As a result, the throughput is reduced. Stream-based round robin achieves good performance in terms of both throughput and output order, but its advantage is confined to a homogeneous and highly loaded system.

13 GUO AND BHUYAN: LOAD BALANCING IN A CLUSTER-BASED WEB SERVER FOR MULTIMEDIA APPLICATIONS 13 For MPEG-4 encoding, He et al. proposed several scheduling algorithms that allocate MPEG-4 objects among multiple workstations in a cluster to achieve real-time interactive encoding rate [22]. When an object is partitioned and processed on multiple workstations, the data dependence is resolved by storing related reference data in each processor s local memory for motion estimation. The scheduling algorithms are derived on the basis of a video model called MPEG-4 Object Composition Petri Net (MOCPN), which captures the spatio-temporal relationship between various objects and user interaction. However, in their schemes, the appearance and disappearance of objects in a video session is simply modeled by user interactions, which is not true for an automatic MPEG-4 playout session. In addition, their scheduling algorithms lack generality for the conventional frame-based coding schemes like MPEG- 1/2 and H CONCLUSION The aim of the paper was to develop scheduling and load balancing algorithms to ensure high throughput and low jitter for multimedia processing on a cluster-based Web server where a few computing nodes are separately reserved for high-performance multimedia applications. We consider the multimedia streaming service which requires on-demand transcoding operations as an example. In this paper, we designed and implemented a media cluster and evaluated the efficiency of seven load scheduling schemes for a real MPEG stream transcoding service. Due to the variability of the individual transcoding jobs, it was necessary to predict the execution time for each job and schedule accordingly. We proposed a dynamic prediction algorithm that predicts the transcoding time for each media unit. Based on the algorithm, we proposed two new load sharing policies, Prediction-based Least Load First (P-LLF) and Prediction-based Adaptive Partitioning (P-AP). For comparison, we implemented Least Load First (LLF) and Adaptive Partitioning (AP) policies where the prediction is based on average execution time. Besides, we also implemented three nonprediction-based schemes, namely, First Fit (FF), Stream-based Mapping (SM), and Adaptive Load Sharing (ALS). Among the seven load sharing schemes, FF, P-LLF, and LLF achieve high throughput but also incur high jitter, whereas P-AP, AP, SM, and ALS try to maintain the unit order of outgoing streams to reduce jitter. Experimental results show that a good balance between throughput and departure jitter is achieved by P-AP. P-AP outperforms FF, LLF, and P-LLF because it establishes mapping among streams and subsets of servers. P-AP outperforms SM and ALS because it takes into consideration the stream heterogeneity. P-AP and P-LLF outperform their counterparts, AP and LLF, because of the better prediction method. RREFERENCES [1] S. Chandra, C.S. Ellis, and A. Vahdat, Differentiated Multimedia Web Services Using Quality Aware Transcoding, Proc. INFO- COM th Ann. Joint Conf. IEEE Computer and Comm. Soc., Mar [2] R. Keller, S. Choi, M. Dasen, D. Decasper, G. Fankhauser, and B. Platter, An Active Router Architecture for Multicast Video Distribution, Proc. IEEE INFOCOM, [3] G. Welling, M. Ott, and S. Mathur, A Cluster-Based Active Router Architecture, IEEE Micro, vol. 21, no. 1, Jan./Feb [4] M. Ott, G. Welling, S. Mathur, D. Reininger, and R. Izmailov, The Journey Active Network Model, IEEE J. Selected Areas in Comm., vol. 19, no. 3, pp , Mar [5] A. Fox, S. Gribble, E. Brewer, and E. Amir, Adapting to Network and Client Variability via On-Demand Dynamic Distillation, Proc. Seventh Int l Conf. Architecture Support for Programming Language and Operating Systems (ASPLOS-VII), [6] A. Fox, S.D. Gribble, and Y. Chawathe, Adapting to Network and Client Variation Using Active Proxies: Lessons and Perspectives, IEEE Personal Comm. on Adaptation, special issue, [7] E. Amir, S. McCanne, and R. Katz, An Active Service Framework and Its Application to Real-Time Multimedia Transcoding, Proc. ACM SIGCOMM Symp., Sept [8] B.A. Shirazi, A.R. Hurson, and K.M. Kavi, Scheduling and Load Balancing in Parallel and Distributed Systems. CS Press, [9] M. Satyanarayanan, Scalable, Secure, and Highly Available Distributed File Access, Computer, May [10] E. Katz, M. Butler, and R. McGrath, A Scalable Http Server: The Ncsa Prototype, Computer Networks and ISDN Systems, vol. 27, pp , [11] H. Zhu, T. Yang, Q. Zheng, D. Watson, O. Ibarra, and T. Smith, Adaptive Load Sharing for Clustered Digital Library Servers, Proc. Seventh Int l Symp. High Performance Distributed Computing, pp , [12] H. Zhu, H. Tang, and T. Yang, Demand-Driven Service Differentiation in Cluster-Based Network Servers, Proc. IEEE INFOCOM, [13] C. Li, G. Peng, K. Gopalan, and T. Chiueh, Performance Garantees for Cluster-Based Internet Services, Proc. 23rd Int l Conf. Distributed Computing Systems (ICDCS 03), May [14] J. Guo, F. Chen, L. Bhuyan, and R. Kumar, A Cluster-Based Active Router Architecture Supporting Video/Audio Stream Transcoding Services, Proc. 17th Int l Parallel and Distributed Processing Symp. (IPDPS 03), Apr [15] L. Kencl and J.Y.L. Boudec, Adaptive Load Sharing for Network Processors, Proc. IEEE INFOCOM, [16] A.D. Bavier, A.B. Montz, and L.L. Peterson, Predicting Mpeg Execution Times, Proc. Int l Conf. Measurements and Modeling of Computer Systems (SIGMETRICS), pp , [17] Ffmpeg Multimedia System, [18] C.K. Hess, D. Raila, R.H. Campbell, and D. Mickunas, Design and Performance of MPEG Video Streaming to Palmtop Computers, Multimedia Computing Networks, [19] E. Amir, S. McCanne, and H. Zhang, An Application Level Video Gateway, Proc. ACM Multimedia, [20] A. Fox, S.D. Gribble, Y. Chawathe, E.A. Brewer, and P. Gauthier, Cluster-Based Scalable Network Services, Proc. Symp. Operating Systems Principles, [21] K.W. Ross, Hash Routing for Collections of Shared Web Caches, IEEE Network, vol. 11, no. 6, Nov.-Dec [22] Y. He, I. Ahmad, and M.L. Liou, Real-Time Interactive MPEG-4 System Encoder Using a Cluster of Workstations, IEEE Trans. Multimedia, vol. 1, no. 2, pp , ACKNOWLEDGMENTS The authors thank Professor Inbum Jung for providing them with the various MPEG streams and helping them understand the terms in the multimedia area.

14 14 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 11, NOVEMBER 2006 Jiani Guo received the BE and ME degrees in computer science and engineering from the Huazhong University of Science and Technology, and the PhD degree in computer science from the University of California, Riverside. She currently works for Cisco Systems. Her research interests include job scheduling in parallel and distributed systems, cluster computing, active router, and network processor. She is a student member of the IEEE. Laxmi Narayan Bhuyan has been a professor of computer science and engineering at the University of California, Riverside since January Prior to that, he was a professor of computer science at Texas A&M University ( ) and program director of the Computer System Architecture Program at the US National Science Foundation ( ). He has also worked as a consultant to Intel and HP Labs. Dr. Bhuyan s current research interests are in the areas of computer architecture, network processors, Internet routers, and parallel and distributed processing. He has published more than 150 papers in related areas in reputable journals and conference proceedings. He has served on the editorial board of Computer, IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, the Journal of Parallel and Distributed Computing, and the Parallel Computing Journal. Dr. Bhuyan is a fellow of the IEEE, the ACM, and the AAAS. He is also an ISI Highly Cited Researcher in Computer Science.. For more information on this or any other computing topic, please visit our Digital Library at

QoS Aware Job Scheduling in a Cluster-based Web Server for Multimedia Applications

QoS Aware Job Scheduling in a Cluster-based Web Server for Multimedia Applications QoS Aware Job Scheduling in a Cluster-based Web Server for Multimedia Applications Jiani Guo and Laxmi Bhuyan Computer Science and Engineering University of California, Riverside, CA 92521 {jiani,bhuyan}@csucredu

More information

THE next generation Internet is being designed to serve

THE next generation Internet is being designed to serve 1690 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 12, DECEMBER 2008 Ordered Round-Robin: An Efficient Sequence Preserving Packet Scheduler Jingnan Yao, Jiani Guo, Member, IEEE, and Laxmi Narayan Bhuyan,

More information

A Prediction-Based Transcoding System for Video Conference in Cloud Computing

A Prediction-Based Transcoding System for Video Conference in Cloud Computing A Prediction-Based Transcoding System for Video Conference in Cloud Computing Yongquan Chen 1 Abstract. We design a transcoding system that can provide dynamic transcoding services for various types of

More information

Los Angeles, CA, USA 90089-2561 [kunfu, rzimmerm]@usc.edu

Los Angeles, CA, USA 90089-2561 [kunfu, rzimmerm]@usc.edu !"$#% &' ($)+*,#% *.- Kun Fu a and Roger Zimmermann a a Integrated Media Systems Center, University of Southern California Los Angeles, CA, USA 90089-56 [kunfu, rzimmerm]@usc.edu ABSTRACT Presently, IP-networked

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT 81 CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT 5.1 INTRODUCTION Distributed Web servers on the Internet require high scalability and availability to provide efficient services to

More information

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師 Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet DICTA2002: Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne, Australia Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet K. Ramkishor James. P. Mammen

More information

Chapter 3 ATM and Multimedia Traffic

Chapter 3 ATM and Multimedia Traffic In the middle of the 1980, the telecommunications world started the design of a network technology that could act as a great unifier to support all digital services, including low-speed telephony and very

More information

Streaming Audio and Video

Streaming Audio and Video Streaming Audio and Video CS 360 Internet Programming Daniel Zappala Brigham Young University Computer Science Department Streaming Audio and Video Daniel Zappala 1/27 Types of Streaming stored audio and

More information

Chapter 14: Distributed Operating Systems

Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication

More information

High Performance Cluster Support for NLB on Window

High Performance Cluster Support for NLB on Window High Performance Cluster Support for NLB on Window [1]Arvind Rathi, [2] Kirti, [3] Neelam [1]M.Tech Student, Department of CSE, GITM, Gurgaon Haryana (India) arvindrathi88@gmail.com [2]Asst. Professor,

More information

An Approach to Load Balancing In Cloud Computing

An Approach to Load Balancing In Cloud Computing An Approach to Load Balancing In Cloud Computing Radha Ramani Malladi Visiting Faculty, Martins Academy, Bangalore, India ABSTRACT: Cloud computing is a structured model that defines computing services,

More information

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc (International Journal of Computer Science & Management Studies) Vol. 17, Issue 01 Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc Dr. Khalid Hamid Bilal Khartoum, Sudan dr.khalidbilal@hotmail.com

More information

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network) Chapter 4 VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network) 4.1 Introduction Traffic Engineering can be defined as a task of mapping traffic

More information

Chapter 16: Distributed Operating Systems

Chapter 16: Distributed Operating Systems Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Load Balancing in Fault Tolerant Video Server

Load Balancing in Fault Tolerant Video Server Load Balancing in Fault Tolerant Video Server # D. N. Sujatha*, Girish K*, Rashmi B*, Venugopal K. R*, L. M. Patnaik** *Department of Computer Science and Engineering University Visvesvaraya College of

More information

Multimedia Requirements. Multimedia and Networks. Quality of Service

Multimedia Requirements. Multimedia and Networks. Quality of Service Multimedia Requirements Chapter 2: Representation of Multimedia Data Chapter 3: Multimedia Systems Communication Aspects and Services Multimedia Applications and Transfer/Control Protocols Quality of Service

More information

LCMON Network Traffic Analysis

LCMON Network Traffic Analysis LCMON Network Traffic Analysis Adam Black Centre for Advanced Internet Architectures, Technical Report 79A Swinburne University of Technology Melbourne, Australia adamblack@swin.edu.au Abstract The Swinburne

More information

A Novel Load Balancing Optimization Algorithm Based on Peer-to-Peer

A Novel Load Balancing Optimization Algorithm Based on Peer-to-Peer A Novel Load Balancing Optimization Algorithm Based on Peer-to-Peer Technology in Streaming Media College of Computer Science, South-Central University for Nationalities, Wuhan 430074, China shuwanneng@yahoo.com.cn

More information

How Useful Is Old Information?

How Useful Is Old Information? 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 1, JANUARY 2000 How Useful Is Old Information? Michael Mitzenmacher AbstractÐWe consider the problem of load balancing in dynamic distributed

More information

The network we see so far. Internet Best Effort Service. Is best-effort good enough? An Audio Example. Network Support for Playback

The network we see so far. Internet Best Effort Service. Is best-effort good enough? An Audio Example. Network Support for Playback The network we see so far CSE56 - Lecture 08 QoS Network Xiaowei Yang TCP saw-tooth FIFO w/ droptail or red Best-effort service Web-surfing, email, ftp, file-sharing Internet Best Effort Service Our network

More information

Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems

Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems G.Rajina #1, P.Nagaraju #2 #1 M.Tech, Computer Science Engineering, TallaPadmavathi Engineering College, Warangal,

More information

Load Balancing in cloud computing

Load Balancing in cloud computing Load Balancing in cloud computing 1 Foram F Kherani, 2 Prof.Jignesh Vania Department of computer engineering, Lok Jagruti Kendra Institute of Technology, India 1 kheraniforam@gmail.com, 2 jigumy@gmail.com

More information

TOWARDS A P2P VIDEOCONFERENCING SYSTEM BASED ON LOW-DELAY NETWORK CODING. Attilio Fiandrotti, Anooq Muzaffar Sheikh, Enrico Magli

TOWARDS A P2P VIDEOCONFERENCING SYSTEM BASED ON LOW-DELAY NETWORK CODING. Attilio Fiandrotti, Anooq Muzaffar Sheikh, Enrico Magli 20th European Signal Processing Conference (EUSIPCO 202) Bucharest, Romania, August 27-3, 202 TOWARDS A P2P VIDEOCONFERENCING SYSTEM BASED ON LOW-DELAY NETWORK CODING Attilio Fiandrotti, Anooq Muzaffar

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

Requirements of Voice in an IP Internetwork

Requirements of Voice in an IP Internetwork Requirements of Voice in an IP Internetwork Real-Time Voice in a Best-Effort IP Internetwork This topic lists problems associated with implementation of real-time voice traffic in a best-effort IP internetwork.

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

Abstract. 1. Introduction

Abstract. 1. Introduction A REVIEW-LOAD BALANCING OF WEB SERVER SYSTEM USING SERVICE QUEUE LENGTH Brajendra Kumar, M.Tech (Scholor) LNCT,Bhopal 1; Dr. Vineet Richhariya, HOD(CSE)LNCT Bhopal 2 Abstract In this paper, we describe

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Efficient DNS based Load Balancing for Bursty Web Application Traffic

Efficient DNS based Load Balancing for Bursty Web Application Traffic ISSN Volume 1, No.1, September October 2012 International Journal of Science the and Internet. Applied However, Information this trend leads Technology to sudden burst of Available Online at http://warse.org/pdfs/ijmcis01112012.pdf

More information

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS? 18-345: Introduction to Telecommunication Networks Lectures 20: Quality of Service Peter Steenkiste Spring 2015 www.cs.cmu.edu/~prs/nets-ece Overview What is QoS? Queuing discipline and scheduling Traffic

More information

Analysis of IP Network for different Quality of Service

Analysis of IP Network for different Quality of Service 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Analysis of IP Network for different Quality of Service Ajith

More information

Dynamic Load Balancing in a Network of Workstations

Dynamic Load Balancing in a Network of Workstations Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing

More information

Various Schemes of Load Balancing in Distributed Systems- A Review

Various Schemes of Load Balancing in Distributed Systems- A Review 741 Various Schemes of Load Balancing in Distributed Systems- A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute

More information

EXPERIMENTAL STUDY FOR QUALITY OF SERVICE IN VOICE OVER IP

EXPERIMENTAL STUDY FOR QUALITY OF SERVICE IN VOICE OVER IP Scientific Bulletin of the Electrical Engineering Faculty Year 11 No. 2 (16) ISSN 1843-6188 EXPERIMENTAL STUDY FOR QUALITY OF SERVICE IN VOICE OVER IP Emil DIACONU 1, Gabriel PREDUŞCĂ 2, Denisa CÎRCIUMĂRESCU

More information

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection

More information

Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

More information

Impact of Control Theory on QoS Adaptation in Distributed Middleware Systems

Impact of Control Theory on QoS Adaptation in Distributed Middleware Systems Impact of Control Theory on QoS Adaptation in Distributed Middleware Systems Baochun Li Electrical and Computer Engineering University of Toronto bli@eecg.toronto.edu Klara Nahrstedt Department of Computer

More information

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and. Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February

More information

Gaming as a Service. Prof. Victor C.M. Leung. The University of British Columbia, Canada www.ece.ubc.ca/~vleung

Gaming as a Service. Prof. Victor C.M. Leung. The University of British Columbia, Canada www.ece.ubc.ca/~vleung Gaming as a Service Prof. Victor C.M. Leung The University of British Columbia, Canada www.ece.ubc.ca/~vleung International Conference on Computing, Networking and Communications 4 February, 2014 Outline

More information

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation R.Navaneethakrishnan Assistant Professor (SG) Bharathiyar College of Engineering and Technology, Karaikal, India.

More information

Multimedia Data Transmission over Wired/Wireless Networks

Multimedia Data Transmission over Wired/Wireless Networks Multimedia Data Transmission over Wired/Wireless Networks Bharat Bhargava Gang Ding, Xiaoxin Wu, Mohamed Hefeeda, Halima Ghafoor Purdue University Website: http://www.cs.purdue.edu/homes/bb E-mail: bb@cs.purdue.edu

More information

3. MONITORING AND TESTING THE ETHERNET NETWORK

3. MONITORING AND TESTING THE ETHERNET NETWORK 3. MONITORING AND TESTING THE ETHERNET NETWORK 3.1 Introduction The following parameters are covered by the Ethernet performance metrics: Latency (delay) the amount of time required for a frame to travel

More information

5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology. stadler@ee.kth.

5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology. stadler@ee.kth. 5 Performance Management for Web Services Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology stadler@ee.kth.se April 2008 Overview Service Management Performance Mgt QoS Mgt

More information

Load Balancing of Web Server System Using Service Queue Length

Load Balancing of Web Server System Using Service Queue Length Load Balancing of Web Server System Using Service Queue Length Brajendra Kumar 1, Dr. Vineet Richhariya 2 1 M.tech Scholar (CSE) LNCT, Bhopal 2 HOD (CSE), LNCT, Bhopal Abstract- In this paper, we describe

More information

Giving life to today s media distribution services

Giving life to today s media distribution services Giving life to today s media distribution services FIA - Future Internet Assembly Athens, 17 March 2014 Presenter: Nikolaos Efthymiopoulos Network architecture & Management Group Copyright University of

More information

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

A Hybrid Load Balancing Policy underlying Cloud Computing Environment A Hybrid Load Balancing Policy underlying Cloud Computing Environment S.C. WANG, S.C. TSENG, S.S. WANG*, K.Q. YAN* Chaoyang University of Technology 168, Jifeng E. Rd., Wufeng District, Taichung 41349

More information

Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks

Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks 2072 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 18, NO. 10, OCTOBER 2000 Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks Maode Ma and Mounir Hamdi, Member, IEEE

More information

White paper. H.264 video compression standard. New possibilities within video surveillance.

White paper. H.264 video compression standard. New possibilities within video surveillance. White paper H.264 video compression standard. New possibilities within video surveillance. Table of contents 1. Introduction 3 2. Development of H.264 3 3. How video compression works 4 4. H.264 profiles

More information

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project Effects of Filler Traffic In IP Networks Adam Feldman April 5, 2001 Master s Project Abstract On the Internet, there is a well-documented requirement that much more bandwidth be available than is used

More information

SOME SOFTWARE ISSUES OF A REAL-TIME MULTIMEDIA NETWORKING SYSTEM

SOME SOFTWARE ISSUES OF A REAL-TIME MULTIMEDIA NETWORKING SYSTEM SOME SOFTWARE ISSUES OF A REAL-TIME MULTIMEDIA NETWORKING SYSTEM Gabriel-Miro Muntean, Liam Murphy Performance Engineering Laboratory, School of Electronic Engineering, Dublin City University, Dublin 9,

More information

Megapixel Surveillance

Megapixel Surveillance White Paper The Latest Advances in Megapixel Surveillance Table of Contents Development of IP Surveillance 03 Benefits of IP Surveillance 03 Development of Megapixel Surveillance 04 Benefits of Megapixel

More information

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network Jianguo Cao School of Electrical and Computer Engineering RMIT University Melbourne, VIC 3000 Australia Email: j.cao@student.rmit.edu.au

More information

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles Application Performance in the QLinux Multimedia Operating System Sundaram, A. Chandra, P. Goyal, P. Shenoy, J. Sahni and H. Vin Umass Amherst, U of Texas Austin ACM Multimedia, 2000 Introduction General

More information

Building a Highly Available and Scalable Web Farm

Building a Highly Available and Scalable Web Farm Page 1 of 10 MSDN Home > MSDN Library > Deployment Rate this page: 10 users 4.9 out of 5 Building a Highly Available and Scalable Web Farm Duwamish Online Paul Johns and Aaron Ching Microsoft Developer

More information

White paper. Latency in live network video surveillance

White paper. Latency in live network video surveillance White paper Latency in live network video surveillance Table of contents 1. Introduction 3 2. What is latency? 3 3. How do we measure latency? 3 4. What affects latency? 4 4.1 Latency in the camera 4 4.1.1

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

An Efficient Packet Scheduling Algorithm in Network Processors

An Efficient Packet Scheduling Algorithm in Network Processors An Efficient acket Scheduling Algorithm in Network rocessors Jiani Guo, Jingnan Yao and Laxmi Bhuyan epartment of Computer Science and Engineering University of California, Riverside {jiani,jyao,bhuyan}@csucredu

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

QoS Parameters. Quality of Service in the Internet. Traffic Shaping: Congestion Control. Keeping the QoS

QoS Parameters. Quality of Service in the Internet. Traffic Shaping: Congestion Control. Keeping the QoS Quality of Service in the Internet Problem today: IP is packet switched, therefore no guarantees on a transmission is given (throughput, transmission delay, ): the Internet transmits data Best Effort But:

More information

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems 215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical

More information

Comparative Study of Load Balancing Algorithms

Comparative Study of Load Balancing Algorithms IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 3 (Mar. 2013), V2 PP 45-50 Comparative Study of Load Balancing Algorithms Jyoti Vashistha 1, Anant Kumar Jayswal

More information

Multimedia Caching Strategies for Heterogeneous Application and Server Environments

Multimedia Caching Strategies for Heterogeneous Application and Server Environments Multimedia Tools and Applications 4, 279 312 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Multimedia Caching Strategies for Heterogeneous Application and Server Environments

More information

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing Sla Aware Load Balancing Using Join-Idle Queue for Virtual Machines in Cloud Computing Mehak Choudhary M.Tech Student [CSE], Dept. of CSE, SKIET, Kurukshetra University, Haryana, India ABSTRACT: Cloud

More information

Voice over IP: RTP/RTCP The transport layer

Voice over IP: RTP/RTCP The transport layer Advanced Networking Voice over IP: /RTCP The transport layer Renato Lo Cigno Requirements For Real-Time Transmission Need to emulate conventional telephone system Isochronous output timing same with input

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Two-Stage Forking for SIP-based VoIP Services

Two-Stage Forking for SIP-based VoIP Services Two-Stage Forking for SIP-based VoIP Services Tsan-Pin Wang National Taichung University An-Chi Chen Providence University Li-Hsing Yen National University of Kaohsiung Abstract SIP (Session Initiation

More information

Grid Computing Approach for Dynamic Load Balancing

Grid Computing Approach for Dynamic Load Balancing International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-1 E-ISSN: 2347-2693 Grid Computing Approach for Dynamic Load Balancing Kapil B. Morey 1*, Sachin B. Jadhav

More information

Classes of multimedia Applications

Classes of multimedia Applications Classes of multimedia Applications Streaming Stored Audio and Video Streaming Live Audio and Video Real-Time Interactive Audio and Video Others Class: Streaming Stored Audio and Video The multimedia content

More information

A QoS Aware Multicore Hash Scheduler for Network Applications

A QoS Aware Multicore Hash Scheduler for Network Applications This paper was presented as part of the main technical program at IEEE INFOCOM 2 A QoS Aware Multicore Hash Scheduler for Network Applications Danhua Guo,2 Cisco Systems, Inc. San Jose, California, USA

More information

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Scheduling Video Stream Transmissions for Distributed Playback over Mobile Cellular Networks

Scheduling Video Stream Transmissions for Distributed Playback over Mobile Cellular Networks Scheduling Video Stream Transmissions for Distributed Playback over Mobile Cellular Networks Kam-Yiu Lam 1, Joe Yuen 1, Sang H. Son 2 and Edward Chan 1 Department of Computer Science 1 City University

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

QoS issues in Voice over IP

QoS issues in Voice over IP COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This

More information

51-20-99 Traffic Control Functions in ATM Networks Byung G. Kim Payoff

51-20-99 Traffic Control Functions in ATM Networks Byung G. Kim Payoff 51-20-99 Traffic Control Functions in ATM Networks Byung G. Kim Payoff A standard monitoring algorithm for two traffic types--constant bit rate (e.g., voice) and variable bit rate (e.g., compressed video)--can

More information

Chapter 51 WAN Load Balancing

Chapter 51 WAN Load Balancing Chapter 51 WAN Load Balancing Introduction... 51-2 WAN Load Balancer Operating Principles... 51-2 Load Distribution Methods... 51-3 Round Robin Distribution... 51-3 Weighted Lottery Distribution... 51-3

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

A Survey Of Various Load Balancing Algorithms In Cloud Computing

A Survey Of Various Load Balancing Algorithms In Cloud Computing A Survey Of Various Load Balancing Algorithms In Cloud Computing Dharmesh Kashyap, Jaydeep Viradiya Abstract: Cloud computing is emerging as a new paradigm for manipulating, configuring, and accessing

More information

UPPER LAYER SWITCHING

UPPER LAYER SWITCHING 52-20-40 DATA COMMUNICATIONS MANAGEMENT UPPER LAYER SWITCHING Gilbert Held INSIDE Upper Layer Operations; Address Translation; Layer 3 Switching; Layer 4 Switching OVERVIEW The first series of LAN switches

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

A Preferred Service Architecture for Payload Data Flows. Ray Gilstrap, Thom Stone, Ken Freeman

A Preferred Service Architecture for Payload Data Flows. Ray Gilstrap, Thom Stone, Ken Freeman A Preferred Service Architecture for Payload Data Flows Ray Gilstrap, Thom Stone, Ken Freeman NASA Research and Engineering Network NASA Advanced Supercomputing Division NASA Ames Research Center Outline

More information

QoSpy an approach for QoS monitoring in DiffServ Networks.

QoSpy an approach for QoS monitoring in DiffServ Networks. QoSpy an approach for QoS monitoring in DiffServ Networks. Ulrich Hofmann Alessandro Anzaloni Ricardo de Farias Santos. anzaloni@ele.ita.br Instituto Tecnológico de Aeronaútica São José dos Campos-SP-Brazil

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

Quality of Service (QoS) on Netgear switches

Quality of Service (QoS) on Netgear switches Quality of Service (QoS) on Netgear switches Section 1 Principles and Practice of QoS on IP networks Introduction to QoS Why? In a typical modern IT environment, a wide variety of devices are connected

More information

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

A Transport Protocol for Multimedia Wireless Sensor Networks

A Transport Protocol for Multimedia Wireless Sensor Networks A Transport Protocol for Multimedia Wireless Sensor Networks Duarte Meneses, António Grilo, Paulo Rogério Pereira 1 NGI'2011: A Transport Protocol for Multimedia Wireless Sensor Networks Introduction Wireless

More information

Maximizing the number of users in an interactive video-ondemand. Citation Ieee Transactions On Broadcasting, 2002, v. 48 n. 4, p.

Maximizing the number of users in an interactive video-ondemand. Citation Ieee Transactions On Broadcasting, 2002, v. 48 n. 4, p. Title Maximizing the number of users in an interactive video-ondemand system Author(s) Bakiras, S; Li, VOK Citation Ieee Transactions On Broadcasting, 2002, v. 48 n. 4, p. 281-292 Issued Date 2002 URL

More information

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani

More information

A Review on Load Balancing Algorithms in Cloud

A Review on Load Balancing Algorithms in Cloud A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

COMPARATIVE ANALYSIS OF DIFFERENT QUEUING MECHANISMS IN HETROGENEOUS NETWORKS

COMPARATIVE ANALYSIS OF DIFFERENT QUEUING MECHANISMS IN HETROGENEOUS NETWORKS COMPARATIVE ANALYSIS OF DIFFERENT QUEUING MECHANISMS IN HETROGENEOUS NETWORKS Shubhangi Rastogi 1, Samir Srivastava 2 M.Tech Student, Computer Science and Engineering, KNIT, Sultanpur, India 1 Associate

More information

@IJMTER-2015, All rights Reserved 355

@IJMTER-2015, All rights Reserved 355 e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com A Model for load balancing for the Public

More information

Performance Evaluation of Linux Bridge

Performance Evaluation of Linux Bridge Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet

More information

Chapter 52 WAN Load Balancing

Chapter 52 WAN Load Balancing Chapter 52 WAN Load Balancing Introduction... 52-2 WAN Load Balancer Operating Principles... 52-2 Load Distribution Methods... 52-3 Round Robin Distribution... 52-3 Weighted Lottery Distribution... 52-3

More information

Towards a Load Balancing in a Three-level Cloud Computing Network

Towards a Load Balancing in a Three-level Cloud Computing Network Towards a Load Balancing in a Three-level Cloud Computing Network Shu-Ching Wang, Kuo-Qin Yan * (Corresponding author), Wen-Pin Liao and Shun-Sheng Wang Chaoyang University of Technology Taiwan, R.O.C.

More information