Fast Web Page Allocation On a Server Using Self- Organizing Properties of Neural Networks

Fast Web Page Allocation On a Server Using Self- Organizing Properties of Neural Networks Vir V. Phoha S. S. Iyengar R. Kannan Computer Science Computer Science Computer Science Louisiana Tech University Louisiana State University Louisiana State University Ruston, LA 71272 Baton Rouge, LA 70803 Baton Rouge, LA 70803 phoha@coes.latech.edu iyengar@bit.csc.lsu.edu rkannan@bit.csc.lsu.edu ABSTRACT This paper presents a novel competitive neural network learning approach to schedule requests to a cluster of Web servers. Traditionally, the scheduling algorithms for distributed systems are not applicable to control Web server clusters because different client domains have different Web traffic characteristics, Web workload is highly variable, and Web requests show a high degree of self-similarity. Using the selforganizing properties of neural networks, we map Web requests to servers through an iterative update rule. This update rule is a gradient descent rule to a corresponding energy function that exploits the selfsimilarity of the Web page requests and includes terms for load balancing. Heuristics to select parameters in the update rule that provide balance between hits and load balancing among servers are presented. Simulations show an order of magnitude improvement over traditional DNS based load-balancing approaches. More specifically, performance of our algorithm ranged between 85% to 98% hit rate compared to a performance range of 2% to 40% hit rate for a Round Robin scheme when simulating real Web traffic. As the traffic increases, our algorithm performs much better than the round robin scheme. A detailed experimental analysis is presented in this paper. Keywords: World Wide Web, Self-Organization, Competitive Learning, Internet, Load Balancing 1. Introduction The World-Wide-Web offers tremendous opportunities for marketers to reach a vast variety of audiences at less cost than any other medium. With huge amount of capital invested in Web sites and various databases and services on these sites, it has become necessary to understand the effectiveness and realize the potential opportunities offered by these services. In the past few years the computational approach to artificial intelligence and cognitive engineering has undergone a significant evolution. Not only is this transformation, from discrete symbolic reasoning to massively, parallel and connectionist neural modeling of compelling scientific interest, but also of considerable practical value. In this paper, we present a novel application of connectionist neural modeling to map web page requests to web server cache to maximize hit ratio and at the same time balance the conflicting request of distributing the web requests equally among web caches. In particular, we describe and present a new neural network- 1

learning algorithm for a fast Web page allocation on a server using self-organizing properties of neural network. The Internet traffic has stabilized at doubling almost every year since 1997 and this growth rate will sustain for some time in future [CK01]. Web sites with heavy traffic load must use multiple servers running on different hardware; consequently, this structure facilitates the sharing of information among servers through a shared file system or via a shared data space. Examples of such a system include Andrews file system (AFS) and distributed file system (DFS). If facilities are not there, then each server may have its own independent file system. Clients Router HTTP Server Cache HTTP Server Cache HTTP Server Cache S 1 S 2 A common file system S n Disk Disk Disk Figure 1: A typical routing configuration of the web page requests. Figure 1 describes an example of this type of system, where requests may come from various client sites to the router, which then pools the requests and directs them to different servers [IC97]. Here the servers S 1, S 2,..., S N each have their own cache and may share a common file system. Correspondingly, each of these servers may have their individual storage, the router decides the allocation of web page request to individual servers. The organization of this paper is as follows. In section 1.1, we describe the terminology used in this paper. In section 2, we review the existing approaches on different web server routing systems. In section 3, we present background concepts related to Pareto distribution, selfsimilarity and an overview of neural network learning algorithms. Section 4 describes problem formulation and the new model for this problem and our algorithmic approach to solve this problem. Section 5 presents the experimental results and section 6 concludes the paper. 2

1.1 Terminology Used in This Paper We define a client as a program that establishes connections to the Internet, whereas a Web-server or a server stores information and serves client requests. For our purpose a distributed Web-server system is any architecture of multiple Web server hosts that has some means of spreading the client requests to the servers. The traffic load on the web site is measured in number of http requests handled by the web site. A session is an entire period of access from a single user to a given Web site. A session may issue many HTML page requests. Typically, a Web page consists of a collection of objects, and an object request requires an access to a server. Any access to a server for an object is defined as a hit. We use site and web site interchangeably to mean a collection of Web objects meant to be served. An object is an entity such as a web page, image etc., that is served by the server. An object-id is an item that uniquely identifies an object in a Web page. An object-id may be a URL, a mapping of a URL to a unique numerical value, and so on; however only one chosen method is consistently followed. There are N servers that service the requests for Web objects. The servers are identified as S 1 S N. The Web-server cluster is scalable and uses one URL to provide a single interface to users. For example, a single domain name may be associated with many IP addresses and each address may belong to a different Web server. The collection of Web servers is transparent to the users. Each request for a Web page is identified as a duplet <O i, R i > where 1 i M. O i is the object identifier of the object requested, R i is the number of requests served so far, for that object, and M is the total number of objects present at the site. 2. A Review of the Existing Approaches There are four basic approaches [CCY99b] to route requests among the distributed Web-server nodes: (1) Client-based, (2) DNS-based, (3) Dispatcher based, and (4) Server based. In the clientbased approach, requests can be routed to any Web server architecture even if the nodes are loosely connected or are uncoordinated. The routing decisions can be embedded by the Web clients like browsers or by the client-side proxy servers. For example, Netscape spreads the load among various servers by selecting a random number i between 1 and the number of servers and directs the requests to the server wwwi.netscape.com. This approach is not widely applicable as it is not easily scalable and many Web sites do not have browsers to distribute load among servers. One proposal that combines caching and server replication is given in [BBM97]. However, client-side proxy servers require modifications on Internet component that is beyond the control of many institutions that manage Web server systems. Domain Name System (DNS) can implement a large set of scheduling policies by translating a symbolic name to an IP address. The DNS approach is limited by the constraint of 32 Web servers for each public URL because of UDP packet size constraints although it can be scaled easily from LAN to WAN distributed systems. In the Dispatcher approach, one single entity controls the routing decisions. This approach provides finely tuned load balancing, but failure of the centralized controller can disable the whole system. Server based approach uses two levels of dispatching, first cluster DNS assigns 3

requests to web servers each Web server may reassign the request to any other server in the cluster. Server based approach can attain as good a control as the dispatcher approach but redirection mechanisms typically increase the latency time perceived by the users. To our knowledge, only the Internet2 Distributed Storage Infrastructure Project (I2-DSI) proposes a smart DNS that uses network proximity information such as transmission delays in making routing decisions [BM98]. However, none of the approaches incorporates any kind of intelligence or learning in routing Web requests. Many Artificial Neural Network architectures have the ability to discover for themselves patterns, features, correlations, or categories in the input data and code them in the output that is these Neural Networks display self-organizing properties. Several researchers [GB99, LT94] have detected self-similarity in the Internet traffic flows. The Web page requests are shown to be self-similar and many of them may be generated from the same user, we can use neural networks to make routing decisions that take into consideration the self-similar nature of the Web requests. In earlier work, self-organizing properties of neural networks have been used in various other applications. Rather than listing all the applications, we list those applications that have directly motivated this work. Phoha [P92] and Phoha and Oldham [PO96] use competitive learning to develop an iterative algorithm for image recovery and segmentation. Using Markov Random Fields they develop energy function that has terms corresponding to smoothing and edge detection and preservation. An update rule is then developed to restore and segment images; this update rule is a gradient descent rule to the energy function. A Modares et al. [MSE99] use a selforganizing neural network to solve multiple traveling salesman problem and vehicle routing problems. Their algorithm shows significant advances both in the quality of solutions and computational efforts for most of the experimental data. K. Yeung et al. [YY98] present a node placement algorithm in shuffle nets that uses a communication cost function between a pair of nodes and develop a gradient descent algorithm that places node pairs one by one. 2.1 Special Considerations of Web Server Routing Requests encrypted using Secure Socket Layer (SSL) use a session key to encrypt information passed between a client and a server. Since session keys are expensive to generate, each SSL request has a lifetime of about 100 seconds and requests between a specific client and server within the lifetime of the key use the same session key. So it is highly desirable to route multiple requests from the same client to a server be routed to the same server, as a different server may not know about the session key. In our approach, the competitive learning approach of neural networks approach achieves this by the very nature of the algorithm. (Note: (1) The IBM s Network Dispatcher (ND) achieves this by routing two SSL requests received within 100 seconds from the same client to the same server. (2) Using a simple Round Robin scheme will be very inefficient as this will require generation of many session keys.) The Web server load information becomes obsolete quickly and is poorly correlated with future load conditions [CCY99a]. Since, the dynamics of the WWW involves high variability of domain and client workloads, exchange of information about load condition of servers are not sufficient to provide scheduling decisions. Hence, we need a real time adaptive mechanism that adapts rapidly to changing environment. 3. Background 4

This section gives the background material such as Pareto distribution, self-similarity, competitive learning, and Kohonen s algorithm, that are used in the subsequent sections of the paper. 3.1 Pareto Distribution and Self Similarity Pareto Distribution: Web traffic can be modeled using Pareto distribution. Pareto distribution is an example of a heavy-tailed distribution. A heavy-tailed distribution is asymptotically hyperbolic, that is irrespective of the distribution of a random variable x over a short period of time; over the long run, the distribution function of x is hyperbolic. The probability mass function of Pareto distribution is p(x) = γk γ x -γ-1, γ, k > 0, x k; and its cumulative distribution is given by F(x) = P[X x] = 1 (k/x) γ Here k represents the smallest value of the random variable and γ determines the behavior of the distribution. The parameters may be estimated from historical data and determines the behavior f the distribution. For example, if γ 2, then Pareto distribution has infinite variance and if γ 1 then the distribution has infinite mean. For more details, an interested reader is referred to [PF95]. Self similarity: Self similarity is usually associated with fractals, the shape of a self-similar object is similar regardless of the scale at which it is viewed; for example, a coastline has similar structure at a scale of one mile, 10 miles, or 200 miles. In case of stochastic phenomenon, such as a time series, self-similarity means that the object s correlational structure remains unchanged at different time scale. Both, Ethernet traffic and Web traffic are shown to be self-similar [LTWW94] [CB97]. 3.2 Neural Network Background This section gives a brief background of competitive learning and Kohonen s rule. We first describe the idea of competitive learning and then describe a mapping of this technique to web server routing. 3.2.1 Competitive Learning In the simplest competitive learning network there is a single layer of output units S={S 1,S 2,,S N }, each is fully connected to a set of inputs O i via connection weights w ij. A brief description of a competitive learning algorithm follows. Let O={O 1,O 2,,O M } be an input to a network of two layers with an associated set of weights w ij. The standard competitive learning rule [H91] is given by: w i * j = η(o j - w i * j ) 5

which moves w i * towards O, the i* implies that only the set of weights corresponding to the winning nodes is updated. The winning node is taken to be the one with the largest output. Another way to write this is: w ij = ηs i (O j - w ij ), where: 1 S i = 0 for i corresponding to the largest otherwise output This is the adaptive approach taken by Kohonen in his first algorithm (see Kohonen model below). The usual definition of competitive learning requires a winner-take-all strategy. In many cases this requirement is relaxed to update all of the weights in proportion to some criterion. This form of competitive learning is referred to as leaky learning [H91]. Hertz et al.[h91] discuss various forms of this adaptive processing for different problems including the traveling salesperson problem. It has become a standard practice to refer to all of these as Kohonen-like algorithms. 3.2.2 Kohonen's Algorithm Kohonen s Algorithms [K89], [H91] adjusts weights from common input nodes to N-output nodes arranged in a 2-dimentsional grid (See Figure 2 below) to form a vector quantizer. Input vectors are presented sequentially in time and after enough input vectors have been presented weights specify clusters or vector centers. These clusters or vector centers sample the input space such that the point density function of the vector centers approximate the probability density functions of the input vectors. This algorithm also organizes weights such that topologically close nodes are sensitive to physically similar inputs. Output nodes are thus ordered in a natural fashion. Thus, this algorithm forms feature maps of inputs. A description of this algorithm follows. Let x 1, x 2,, x N be a set of input vectors, which defines a point in N-dimensional space. The output units O i are arranged in an array and are fully connected to input via the weights w ij. A competitive learning rule is used to choose a winner unit i*, such that then the Kohonen's update rule is given by: w i* - x <= w i - x for all i, w i = η h(i, i*) (x - w i Old ). Here h(i,i*) is the neighborhood function such that h(i,i*) = 1 if i = i* but falls off with distance r i - r i * between units i and i* in the output array. The winner and close by units are updated appreciably more than those further away. A typical choice for h(i, i*) is: e -( r i -r i * /2σ2), where σ is a parameter that is gradually decreased to contract the neighborhood. η is decreased to ensure convergence. 6

Figure 2: Kohonen's Network Using the principles derived from competitive learning and building on the earlier works in selforganizing properties of neural networks, we propose and evaluate a neural network approach to route Web page requests in a multiple server environment. The scheduling algorithms for traditional distributed systems are not applicable to control Web server clusters because of nonuniformity of load from different client domains, high variability of real Web workload, and a high degree of self-similarity in the Web requests [LTWW94] [CB97]. 4. Conceptual Framework and Formulation of the Problem A new model to ensure high performance of a web server using neural networks is proposed in this paper. One technique for improving performance at web sites is to cache data at the client site so as to provide fewer overloads to the server. However this approach is limited by the cache capacity and is very inefficient for serving dynamic web objects because copies of data may have to go through several layers of software and file system, require intermediate processing, object fetches from cache, and then again be passed through operating system layers of the client machine. To handle increasing web traffic we propose a technique to assign web objects to the server cache at the router level. This router may be one of the servers or a dedicated machine to act as a server. Figure 3 presents a conceptual framework of the proposed model. The object id, and request count of the most frequently accessed objects are in the router s memory. The actual objects reside in the server s cache. 7

Figure 3: A Conceptual Framework of our model The underlying principle of our model is the competitive learning properties of the Neural Networks. The model presented here is an extension of an earlier work [PO96], [P92] where a competitive learning algorithm was used to restore and segment images. Here each server competes to serve the request for an object (See Equation 1, section 4.2). The server, which is closest to the request, is the one who wins and if this object is already in the server s cache, we strengthen the relationship between the server and the object request (See Section 4.3). Here a competitive basis algorithm is used to make routing decisions. This approach is also related to Kohonen s self-organizing feature map [K89] technique, which facilitates mapping from higher to lower dimensional spaces so that the relationships among the inputs are mapped onto high reliable outputs. In Kohonen s algorithm, the output neurons that are generated are very close to the winning neurons based on the topology structure. In our model, connection strength between the object requests to the server are analogous to weight connections between the two layers of Kohonen s model. The input layer consists of the object request and the output layer consists of the servers. In addition to mapping the input space of Web object requests to the weight space, we use this property to classify the object requests to the servers. 4.1 Description of the New Model In our neural placement model, we define two layers in the network: the input layer W and the output layer S. Layer W contains M nodes, each object is assigned a node. The object request count, R i, is applied as the input to the node corresponding to the object in the W layer. The layer S has N nodes, the number of servers, and each node, j, is assigned to a server S j. The weight w ij connects node i in the input layer to node j in the output layer. This weight forms connection strength from an object <O i, R i > to a server S j. A pictorial representation of this architecture is given in Figure 4. 8

Figure 4: Neural Architecture to direct Web page request to the appropriate server. We approximate the number of input nodes (M) from the number of objects, such as images, HTML pages, sound clips, etc. in a given Web site. Addition or deletion of input nodes (or new objects) would not affect the existing structure of the model because the links of the new node to the servers (S 1, S 2,, S N ) are the only ones to be added (see Figure 4 and equation 2) and no change in the existing links and weights is needed. In our simulations, the number (M) of objects ranged from 150 to 1050. Kohonen s algorithm maps the topological structure of the input onto the weights, whereas in our model we transfer page popularity structure of the Web object requests to the weight structure between the input and output layers of our model and use competition to transfer the request to the winning server cache; at the same time we add a balancing factor in the decision process to equitably allocate object requests among server caches to load balance. 4.2 Mathematical Formulation Using Competitive Learning We formulate the problem of assigning web objects to the servers as a mapping for the placement of <O i, R i > W onto a server space S as k : W S, such that O i W S j S; (i=1, M and j =1, N) with the condition to allocate (classify) the objects (O i ) such that the objects are distributed equally among the servers (S j ), to ensure equitable load among the servers, and at the same time maximize the hits in the servers cache. Our objective is to optimize two things: (1) increase the number of hits, in the sense that a server gets the same object request that it has served before to increase object hit. As our simulation results show, this in turn would accelerate the performance of the web site to allow fast response and fast loading of dynamic web objects; and (2) to distribute the object requests among servers in such a fashion that the object requests are distributed equitably among the servers. Our algorithm achieves both these goals by learning the earlier requests made and also learning the distribution of object requests. The server k for a given object is selected in the following way. The request count for requested object is applied as the input to the node corresponding to that object. The server to which the 9

request is forwarded is chosen depending on the least value of the mod of the request count to the weight connecting to that server. Thus, R i w ik = Min R i w ij where j =1, N.(1) Ri It is better to use instead of R i to prevent dealing with very large numbers. Learning is R i achieved by updating the connection strength between object and the winning server using the update rule: w ik = η Λ(R i, w ik, K) (R i -W ik ) + αk ( W iτ - N W ik ) (2) Here η, α, and K are the parameters that determine the strength of controlling the maximum hit or balancing the load among the servers, and Λ(R i, w ik, K) is given by g* g /(2* K* K ) e Λ(Ri, w ik, K) =, here g = (R i -W ik ), d Ψ( d, K) = e j Ψ( d, K) and d= (R i -W ij ), j=1, N (3) * d /(2* K* K ) Now by integrating equation (2) and after some algebraic manipulations we get the energy function [P92]: E = η K ln m,k e -d*d/2k*k +α m,k (w i,j-1 + w i, j+1-4w i, j + w i-1,j + w i, j ) 2.(4) Since the update rule given in equation (2) is of the form descent rule for the energy function given in equation (4). E w ik the update rule is a gradient 4.3 Heuristics on the selection of parameters. The neighborhood function Λ(R i, w ik, K) is 1 for i=k and falls off with the distance R i -w ij Ideally, we should select the neighborhood such that the servers with the mean hit count are close together or servers that serve related objects are closer in a neighborhood. The first part on the right hand side of equation (4) pushes the request count R k towards w ik. Thereby ensuring that object requests for objects that are in server S i s cache will be directed to the server S i. This will increase the likelihood of a object request that was directed to the server S i, to be directed to the same server again thereby increasing the hit ratio. The second term on the right ensures that no one server will be overloaded, that is the object requests are distributed evenly among the servers. By a proper balance of the parameters η, α, and K; we can direct the 10

1 flow of traffic in an optimal fashion. η, α, and K are related as follows η Higher η and αk lower αk mean we stress object hits more and less balancing and higher αk means we give more weight to balancing among the servers. Putting α=0 would mean increasing web object hits; we have achieved close to 100% hits using small values of α and still had good load balancing among servers. 4.4 The Algorithm and Its Distinctive Features An outline of the algorithm follows: Initialize M to the number of objects and N to the number of servers; Initialize with random values the weights {w ij } between the page requests and the servers and select parameters η, α, and K. While (server available) //Service page requests until server is running. { //begin while// Calculate R i w ik and using equation (1) select server S k ; if (<O i, R i > is in the cache of S k ) { Update the weight using equation (2) //This is a hit so we want the algorithm to strengthen this connection// } else { Find the correct server S j using R j w ij Update the connection weight for this server using equation (2) // In this case this is a miss-hit so we want the algorithm to learn the //correct response by strengthening the appropriate weight // } } //end while / Our approach has following distinctive features: Our approach maps the self-similar nature of web traffic to the self-organizing property of the neural model. In the construct of our approach, request for a web object has to be allocated to a server. The optimality of this allocator is obtained through a learning rule whereas a network adapts to unpredictable changes using the framework of competitive learning. The stability of responses are generated through a process of learning. The learning rule has an associated energy function, which captures the global knowledge about the load information on the servers. The learning rule is a gradient descent function to this energy function, so this algorithm uses the global knowledge about the load to make local decisions about load balance 11

The algorithm creates a balance between conflicting requirements to equitably distribute web requests among servers and at the same time allocate requests to servers to maximize hitratio. At present, the algorithm uses a Winner take all strategy. We may modify the algorithm to incorporate leaky learning [H91 et al.] in which we define a neighborhood of servers [cooperative, buddy] whose weights are updated in proportion. This scenario becomes relevant when we consider the interconnection between the servers and dependencies between objects that may be distributed among closely connected servers. 5. Experimental Results Our model incorporates the characteristics of Web traffic and the self-similarity inherent in real Web traffic, by modeling the traffic through a heavy tailed distribution of the type P[X > x] ~ x -γ as x for 0 < γ < 2. The results reported here correspond to Pareto distribution, with probability density function p(x) = γ k γ x -γ -1, where γ = 0.9 and k = 0.1. We conducted the simulations in two separate environments: (1) Louisiana State University Networks Lab. Here the Web traffic was simulated using Pareto distribution, on a single PC and (2) Computer Science laboratory at Louisiana Tech University. Here the simulation environment consisted of a network of five IBM PCs. One PC was used as a dedicated Web server written in Java specifically for this experiment; one PC was used as a Proxy server, and on the rest of the three PCs, referred to as client PCs, Microsoft Internet Explorer (IE) and Netscape Browser settings pointed to the proxy server for all requests. In addition to the IE, each client PC simulated 300 clients by creating separate threads (using Java) for each client. We implemented the neural model on the PC running the proxy server. We compared the performance of our algorithm with Round Robin (RR), Round Robin 2 (RR2), and a special case of Adaptive Time-to-Live (TTL) algorithm. In RR2 algorithm, a Web cluster is partitioned into two classes: Normal domains and Hot domains. This partition is based on domain load information. In this strategy, Round Robin scheme is applied separately to each of the domains. Details of this algorithm are given in [CCY99a]. In our implementation of Adaptive TTL algorithm, we assigned a lower TTL value when a request is originated from Hot domains and a higher TTL value when it originates from Normal domain, this way the skew on Web objects is reduced. The following table gives characteristics of our simulations. Sample Size Number of Servers Web Pages Distribution Used Table 1 Simulation Characteristics Number of Web objects ranged from 150 to 1050 and the statistics were collected at the intervals of 50 objects each. Statistics were collected for 4, 8, 16, 32 servers Uniform and Non Uniform (Pareto) 12

Algorithms Neural Network (NN), Round Robin and Round Robin 2 (RR), and Adaptive Time-to- Live The comparison charts in the following discussions relate only to Round Robin scheme and our proposed Neural Net based algorithm. The results (hit ratios) for adaptive TTL algorithm varied widely for different input size of Web objects and for different input object distributions, but never ranged higher than 0.68. 1 Hit Ratio 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 150 250 350 450 550 650 750 850 950 1050 RR Performance with 4 Servers NN Performance with 4 Servers RR Performance with 8 Servers NN Performance with 8 Servers RR Performance with 16 Servers NN Performance with 16 Servers RR Performance with 32 Servers NN Performance with 32 Servers No of Page Requests Figure 5 Performance of page placement algorithm using competitive learning (Neural Network) versus Round Robin Algorithms (for Non-Uniform Input Data Distribution) 1 0.9 RR Performance with 4 Servers 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 150 250 350 450 550 650 750 850 950 1050 NN Performance with 4 Servers NN Performance with 8 Servers RR Performance with 8 Servers NN Performance with 16 Servers RR Performance with 16 Servers NN Performance with 32 Servers RR Performance with 32 Servers 13

Figure 6 Performance of page placement algorithm using competitive learning (Neural Network) versus Round Robin Algorithms (for Uniform Input Data Distribution) As can be seen from Figure 5, the Neural Network (NN) competitive learning algorithm performs much better as compared to Round Robin schemes (also RR2) when input pages follow a Pareto distribution. As the number of input objects increase, our algorithm achieves a hit ratio close to 0.98, whereas the round robin schemes never achieved a hit ratio of more than 0.4. For NN algorithm, hit ratios (0.86) with a smaller number of objects is attributed to some learning on the part of the algorithm, but as the algorithm learns, the hit ratio asymptotically stabilizes to 0.98 for larger number of objects. For uniform distribution of input objects, NN algorithm performs similarly as for nonuniform distribution and is much better than the Round Robin schemes (See Figure 6). Table 2 Comparison of maximum hit ratio achieved, input size, and servers RR NN Uniform (0.31,150,4) (0.98,1050,4) Non-Uniform (0.32,150,4) (0.98,1050,4) Round Robin scheme never achieves a hit ratio higher than 0.32, where as NN achieves hit ratios close to 0.98 (See Table 2). Table 3 Comparison of minimum hit ratio achieved, input size, and servers RR NN Uniform (0.03,1050,32) (0.85,150,32) Non-Uniform (0.02,1050,32) (0.86,150,32) As a worst case, NN achieves a hit ratio of as high as 0.85 for 32 servers, where as RR schemes go as low as 0.02 hit ratio (See Table 3). 6. Conclusions Our analysis shows the following results: (1) The performance of our algorithm (NN) increases considerably (from 0.85 hit rate to 0.98 as compared to 0.02 to 0.38 for Round Robin scheme) as the traffic increases where as the performance of Round Robin decreases. This result holds true irrespective of the number of servers. This is a result of a push of a object towards the same server based on the learning component in equation (2). 14

(2) For uniform distribution of Web object requests and at a lower traffic rate with large number of servers (16 and 32), both the algorithms perform equally well. But as the traffic increases our algorithm performs much better than the RR scheme. (3) For a non-uniform distribution (Pareto distribution), our algorithm performs considerably better for lower and higher traffic rates and the performance irrespective of the number of servers. For Pareto distribution, which closely models real Web traffic, better performance of our algorithm, at larger input rate of Web objects is a very attractive result of our algorithm. In summary, we have presented a novel approach using a competitive model of neural network for fast allocation of web pages and balancing the load between various servers. Our algorithm has ability to learn the arrival distribution of Web requests and adapts itself to the load and the changing characteristics of web requests. Simulation results show an order of magnitude improvement over some existing techniques. We are currently working on developing methodologies to include meta-data related to web pages in our algorithm to further improve our approach. 7. References [BBM97] M. Baentsch, L. Baum, G. Molter, Enhancing the Web s infrastructure: From caching to Replication, IEEE Internet Computing, Vol. 1, No. 2, pp. 18-27, Mar-Apr. 1997. [BM98] M. Beck, T. Moore, The Internet2 Distributed Storage Infrastructure Project: An architecture for Internet content channels, Proc. Of 3 rd Workshop on WWW Caching, Manchester, England, June 1998. [CB97] M. E. Crovella and A. Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, IEEE/ACM Transactions on Networking Vol. 5, Number 6, pp 835-846, December 1997. [CCY99a] V. Cardellini, M. Colajanni, and P. Yu, DNS Dispatching Algorithms with State Estimators for Scalable Web-server Clusters, World Wide Web Journal, Baltzer Science, Vol. 2, No. 2, July 1999. [CCY99b] V. Cardellini, M. Colajanni, and P. Yu, Dynamic Load Balancing on Web-server Systems, IEEE Internet Computing, Vol. 3, No. 3, pp. 28-39, May-June 1999. [CK01] K. G. Coffman and A. M. Odlyzko, Growth of the Internet: Preliminary research, available at http://www.dtc.umn.edu/~odlyzko/doc/oft.internet.growth.pdf. See also Internet growth: Is there a "Moore's Law" for data traffic? Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende, eds., Kluwer, 2001. [GB99] Grossglauser M. and J.-C. Bolot, On the Relevance of Long-Range Dependence in Network Traffic, IEEE/ACM Transactions on Networking, 7 (5), pp. 629-640, October 1999. 15

[H91] J. Hertz, et al., Introduction to the Theory of Neural Computation, Lecture Notes, Vol. 1, Reading, MA: Addison-Wesley Publishing Co., 1991. [IC97] A. Iyengar and J. Challenger. Improving Web Server Performance by Caching Dynamic Data. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, December 1997. [K89] T. Kohonen, Self-Organization and Associative Memory, 3 rd Verlag, 1989. ed. Berlin: Springer- [LT94] Leland, W.E., M. S. Taqqu, 1998 and Wilson, D. V., On the Self-Similar Nature of Ethernet traffic (Extended Version), IEEE/ACM Transactions on Networking, 2(1), February 1994. [LTWW94] W.E. Leland, M.S. Taqqy, W. Willinger, and D.V. Wilson, On the self-similar nature of Ethernet traffic (extended version), IEEE/ACM Transactions on Networking, 2(1);1-15, 1994. [MSE99] A. Modares, S. Somhom, and T. Enkawa, A self-organizing neural network for multiple traveling salesman and vehicle routing problems, Intl. Trans. In Op. Res. 6 (1999) pp. 591-606. [P92] Phoha V., Image Recovery and Segmentation Using Competitive Learning in a Computational Network, Ph.D. Dissertation 1992, Texas Tech University, Lubbock, Texas, 1992. [PF95] V. Paxson and S. Floyd, Wide-area traffic: The failure of Poisson modeling, IEEE/ACM Transactions on Networking, 3(3):226-244, June 1995. [PO96] V. Phoha and W. J. B. Oldham, Image Restoration and Segmentation using Competitive Learning in a Layered Network, IEEE Transactions on Neural Networks, pp. 843 856, Vol. 7, No. 4, July 1996. (See also V. V. Phoha and W. J. B. Oldham, Corrections to Image Restoration and Segmentation using Competitive Learning in a Layered Network IEEE Transactions on Neural Networks, Vol. 7, No. 6, November 1996.) [YY98] K. Yeung and T. Yum, Node Placement Optimization in ShuffleNets, IEEE/ACM Transactions on Networking, Vol. 6, No. 3, pp. 319-324, June 1998. 16