114 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 high-level taxonomy of Web caching systems and lay out the scope of our survey. Web contents c

Transcription

1 Mar. 2004, Vol.19, No.2, pp J. Comput. Sci. & Technol. Web Caching: A Way to Improve Web QoS Ming-Kuan Liu 1;2, Fei-Yue Wang 1;2, and Daniel Dajun Zeng 1;3 1 The Lab for Complex Systems and Intelligent Sciences, Institute of Automation, The Chinese Academy of Sciences Beijing , P.R. China 2 Systems and Industrial Engineering Department, University of Arizona, Tucson, AZ 85721, USA 3 Management Information Systems Department, University of Arizona, Tucson, AZ 85721, USA mingkuan@ .arizona.edu; feiyue@ .arizona.edu; zeng@bpa.arizona.edu Received July 30, 2002; revised June 2, Abstract As the Internet and World Wide Web grow at a fast pace, it is essential that the Web's performance should keep up with increased demand and expectations. Web Caching technology has been widely accepted as one of the effective approaches to alleviating Web traffic and increase the Web Quality of Service (QoS). This paper provides an up-to-date survey of the rapidly expanding Web Caching literature. It discusses the state-of-the-art web caching schemes and techniques, with emphasis on the recent developments in Web Caching technology such as the differentiated Web services, heterogeneous caching network structures, and dynamic content caching. Keywords 1 Introduction Web traffic, Web caching, Web QoS, differentiated service, dynamic content caching The Internet and World Wide Web have experienced tremendous growth in the past decade. As a direct result of the Web's popularity and increasing acceptance, the Web network bandwidth demands have been increasing much faster than the bandwidth capacity expansion, resulting in the network traffic congestion and Web server overload. Many researchers have been working on how to improve the Web performance since early 90's and many approaches have been proposed and studied [1]. Among them, Web Caching technology has emerged as one of the effective solutions to reduce Web traffic congestion, alleviate Web server overload, and in general, improve the scalability and Quality of Service (QoS) of the Web-based systems [1 7]. The idea of using caches to improve system performance has been developed and applied successfully long before the advent of the Web. The bestknown applications of caching are in CPU, RAM, and file systems where caches are defined as fast, temporary stores for commonly used items [8 10]. Similar caching ideas can be extended to the Webbased systems. Caching user-requested documents at the client Web browser or at local Web caching proxy servers lowers the user perceived access latency defined as the amount of time elapsed between the time when a user request is issued and the time when the requested object is returned to the user's browser. In addition, Web caching has the potential of reducing network bandwidth consumption, thereby alleviating Web traffic congestion. Moreover, Web caching can alleviate the workload of original Web servers by reducing the number of the client requests. Lastly, Web caching may improve the failure tolerance and robustness of the whole Web system by maintaining a cached copy of Web documents andserving userrequests even when original servers or networks become temporarily unreachable. Web caching has experienced rapid growth in recent years [6]. The academic literature on Web caching is quickly expanding. Numerous conferences and workshops with a specific emphasis on Web caching have been held by the engineering and business research communities. In addition, many caching-related commercial offerings are launched and accepted as part of the general Web infrastructure and as a specific type of value-added Web-based commercial service [11 13]. Although surveys on Web caching technology exist in the literature (e.g., [7]), we conclude that recent developments in Web caching warrant a significantly updated survey. This paper represents our efforts in understanding and systemizing significant technical issues that are being addressed by the recent Web caching literature, with emphasis on new trends in Web caching research including differentiated Web QoS, heterogeneous caching architecture, and dynamic content caching. In the remainder of this section, we present a Λ Regular Paper This work is supported in part by the Outstanding Oversea Scholar Award and the Outstanding Young Scientist Research Fund from the Chinese Academy of Sciences. The authors of this paper are listed alphabetially, all equally contributed to its completion.

2 114 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 high-level taxonomy of Web caching systems and lay out the scope of our survey. Web contents can be cached at different locations along the path between clients and original Web servers. According to the location of caches, Web caching systems can be classified into three types: the browser caches, proxy caches, and surrogate caches. Browser Caches. Most modern Web browsers have build-in caches. Browser caches utilize the client's local hard disk or RAM to cache Web documents. The user can customize the size of the browser caches. Browser-based caching typically works for only one user and sharing of cached contents among different users is not allowed. This type of caching can significantly reduce the Web access latency when the user needs repeated accesses to certain Web pages. Proxy Caches. Unlike browser caches, Web proxy caches are located between client computers and Web servers and function as a data relay. A typical Web proxy cache serves many users at the same time. When the Web caching proxy server receives a user request, it first looks up the requested object in its cache. If a fresh cached copy is found, the proxy returns it to the client. Otherwise it relays the request to other cooperating cache proxies or the original Web server, and returns the found fresh documents to the client while leaving a copy in its own cache. Proxy caches usually are placed close to the client. Surrogate Caches. Surrogate caches work similarly as proxy caches; the key difference is that surrogate caches are typically located close to Web servers. While the main goal of proxy caches is to reduce the Web access latency, that of surrogate caches is to alleviate Web servers' workload. The definition of surrogate caches is given in RFC 3040 as a gateway co-located with an origin server, or at a different point in the network, delegated the authority to operate on behalf of, and typically working in close cooperation with, one or more origin servers" [14]. Surrogate caches can be used to replicate the contents of the corresponding Web servers at many different locations on the Web. Users' requests for objects from these Web servers can then be directed to the nearest surrogate cache that contains the requested contents. As a result, surrogate caches alleviate the workload of the servers and potentially reduce the client access latency. Another common usage of surrogate caches is to accelerate Web servers' performance. Some Web servers are slow because they have to generate Web pages dynamically. Surrogate caches can be used to cache the response of these servers to improve server performance. Proxy caching has been the main focus of the current Web caching research for several reasons. First, a proxy cache-based approach makes minimum assumptions about the Web servers and networking protocols. Thus it can be readily applied in a wide spectrum of application settings without modifying the Web server behavior or the underlying networking infrastructure. Second, unlike browser caches, proxy caches typically serve a number of users who are on the same subnets. These users, often working for the same organization, have lots of commonalities in their Web usage. This creates a lot of opportunities to realize the potential performance gain and latency reduction of Web caching. Third, from the system architecture point of view, proxy caches are located on proxy servers that have been traditionally used for other purposes (e.g., network security). This makes installation and configuration of caching services relatively easy and user-transparent. Recent years have also witnessed the growing trend of providing value-added Web services off proxy servers. Co-locating caching services with these Web services can ease system integration and maintenance efforts and lead to significant cost savings. This paper mainly focuses on proxy caching. The rest of the paper is organized as follows. Section 2 presents various architectural designs of Web caching systems. Section 3 analyzes the characteristics of Web traffic. Section 4 describes prefetching, cache replacement, and document coherence maintenance policies. These policies govern various aspects of the operation of a proxy cache. In Section 5, we address inter-cache routing and cooperation problems. These problems arise when multiple caches work collaboratively. We briefly discuss the performance metrics and evaluation of a caching system in Section 6 and discuss recent trends in Web caching in Section 7. Section 8 concludes the paper with issues for future research. 2 Architectural Designs of Web Caching Systems 2.1 Overall Design of Web Caching Systems Fig.1 illustrates a high-level Web caching system architecture followed by most Web caching systems. Web objects can be cached at either the clients' local machines, Web cache proxy servers, surrogate servers, or Web servers, or any combinations of these locations. The following description is based on the assumption that caching is enabled at all the above-

3 Ming-Kuan Liu et al.: Web Caching: A Way to Improve Web QoS 115 mentioned locations. Note that it is easy to adapt the description below to fit any caching system in which the caching mechanism is absent at certain locations (e.g., the Web browser's caching function is turned off). Fig.1. Overall Web caching system architecture. When a user requests a Web object, the browser first tries to locate a valid copy in its own cache. If a valid copy is found, the cached copy will be presented to the user immediately without incurring any Web traffic. If none is found, a cache proxy server, which typically resides on a network location close to the client machine, is contacted. Upon receiving such a request from a client, the cache proxy first checks in its cache to locate a valid cached copy. If it is found, the cache proxy returns it to the client, which in turn presents the object to the user. If none is found, i.e., a cache miss occurs, the cache proxy directs the request to participating cooperative cache proxies or the original Web server. These cooperative cache proxies process the request in a similar manner and may need to relay it to a surrogate cache. The worse-case scenario is that none of the caches along the network path has a valid copy of the requested object. When this happens, the request is relayed all the way up to the original Web server that publishes the requested Web object. A critical research issue in developing cooperative Web caches is how to organize and coordinate the behavior of the caches that work cooperatively. In this section, we survey several well-studied architectures for collaborative Web caching. Issues concerning the development of a single proxy cache will be discussed in Section 4. Cooperative proxy Web caching architectures can be divided into three major categories: hierarchical [17;20], distributive [18;21], and hybrid [19;22]. Hierarchical Caching Architectures. The hierarchical caching architecture, as shown in Fig.2, was first proposed in the Harvest project [17]. In this type of architecture, the caching system consisting of multiple proxy caches is structured as a tree. Each tree node corresponds to a proxy cache and has exactly one parent (with the exception of the root node). When a cache miss occurs at a certain node (i.e., a valid copy of the requested object cannot be found), this node forwards the request to its parent. This forwarding process can be repeated till the requested object is located or the root cache is reached. The root cache may need to contact the origin Web server if it is unable to satisfy the request. Whenever the requested object is found, either at one of the caches or at the original server, it travels back to the client in the order that is the reverse of the cache request chain. A copy of this object is also cached at each of the intermediate caches along this path. The hierarchical caching architecture roughly maps to the topology of the Internet organized as a hierarchical network of ISPs and serves to diffuse the popular Web objects towards the demand. However, this architecture suffers from several major drawbacks. First, each hierarchy level introduces additional time delays in processing requests. Second, there is significant 2.2 Architectural Designs of Cooperative Web Caches It has been observed recently that a Web caching system relying on a single proxy cache has limited value [15;16]. However, cooperative Web caching that utilizes a network of cache proxy servers to serve a set of clients has shown great potential in (a) improving Web QoS, (b) reducing the chance of certain cache proxy servers becoming the performance and Web traffic bottleneck, and (c) improving the overall system fault-tolerance and robustness. In effect, cooperative Web caching has already been widely used in recent caching systems [17 19]. Fig.2. Hierarchical caching architecture.

4 116 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 redundancy in the storage of Web objects since many objects are cached at all levels of the cache tree. Lastly, caches close to the root need to store a large number of Web objects and can become major performance and traffic bottlenecks. Distributed Caching Architectures. Fig.3 illustrates the architecture of distributed proxy caching systems. In this architecture, there are only institutional" caches at the edge of the network that cooperate to serve each other's misses. No intermediate caches between the clients and these institutional caches are used [18;21]. Fig.3. Distributed caching architecture. Because of the lack of the simple chain of referencing and control as in the case of hierarchical caching architectures, the facilitation of cooperation between institutional caches is relatively complex and can significantly impact cache performance. Several key coordination mechanisms that have been developed in the literature are surveyed below. ffl Institutional caches can query other cooperating caches for Web objects or documents that have resulted in local misses. The Inter Cache Protocol (ICP) can be used to transmit these queries and replies between cache servers [19]. The main drawbacks of this query-based approach are: (a) potentially significant increase in network bandwidth consumption, and (b) potentially long access latency as a result of needing to poll all cooperating caches and wait for responses. ffl Institutional caches can keep a digest or summary of the contents of the other cooperating caches [23;24], avoiding the need for expensive queries and polls. Content digests or summaries are periodically exchanged among the institutional caches. To make the distribution of the digests/summaries more efficient and scalable, a hierarchical infrastructure of intermediate nodes can be used. However, this hierarchical infrastructure only distributes meta information including the locations of the Web objects but not the cached objects themselves. ffl Institutional caches can also cooperate with each other using a hash function that maps a client request into a certain cache [25 27]. With this approach there are no duplicated copies of the same object in different caches and there is no need for caches to know about each other's content. However, having only one single copy of an object among all cooperating caches limits this approach to local environments with well-interconnected caches. Hybrid Caching Architectures. In a hybrid architecture, caches cooperate with other caches at the same level or at a higher level using distributed caching. ICP is typically used as the underlying communication and coordination protocols in this architecture. For instance, a Web object can be fetched from either a parent or a neighbor cache depending on which cooperating cache offers the lowest round-trip time (RTT). The cooperation between participating caches needs to be carefully planned in order to avoid repeated fetching of objects from distant or slow caches when it might be more efficient to fetch these objects directly from the original Web servers [28]. Recently, Rodriguez and Spanner [22] proposed a mathematical model to analyze the performance of the above three architectures. They show that hierarchical caching systems have lower connection time (the time needed to establish the connection between the client machine and a cache or between caches) while distributed caching systems have lower transmission time (the time needed to transmit the requested Web object over the network). In addition, hierarchical caching has lower network bandwidth usage, while distributed caching is able to distribute the Web traffic more evenly and reduce hot spots" (congested network locations). This is because that distributed caching makes more use of local network bandwidth. As for disk storage requirements, distributed caching typically requires relatively small storage space with an institutional cache needing several gigabytes of space, while hierarchical caching needs more storage space, especially for caches near the root. It is common for high-level caches to have hundreds of gigabytes storage in a hierarchical caching system. Their analysis also shows that in a hybrid architecture the latency greatly varies depending on the number of caches that cooperate at each network level. 2.3 ISAAC: An Adaptive Web Caching Architecture Almost all the caching architectures surveyed

5 Ming-Kuan Liu et al.: Web Caching: A Way to Improve Web QoS 117 above assume that various aspects of the Web environment, such as Web traffic patterns, server and client locations, network connectivity, and cooperative caching network topology, remain relatively stable. Obviously, because of the dynamic nature of the Web, a caching system that dynamically adapts its behavior according to the changes in the Web environment has many advantages over the non-adaptive approaches. In effect, research on adaptive Web caching is starting to emerge in recent years [29]. In this section, we use the ISAAC system, developed at the University of Arizona, as an example to illustrate the basic ideas behind adaptive Web caching. As illustrated in Fig.4, the ISAAC (short for Intelligent Strategies and Architectures for Adaptive Caching) system consists of five major modules [30] : the client request processor (CRP), the network traffic monitor (NTM), the inter-proxies cooperation manager (IPCM), the cache proxy kernel module (CPKM), and the storage manager (SM). The CRP module is responsible for preprocessing and parsing HTTP requests of clients. The NTM module monitors the current network traffic and provides input to the CPKM module. For example, if the NTM detects that the network is idle, it will inform the CPKM module to initiate the prefetching or coherency maintaining processes (see Section 4 for details about these processes). The IPCM module is responsible for the inter-cache routing and cooperation strategies, relevant to cooperative caching systems. The CPKM module implements key Web caching policies such as prefetching, coherency, and replacement. These policies are not hardwired. Different types of policies and control parameter settings are invoked to best match the specific caching scenario characterized by the input from NTM and IPCM. The SM module is responsible for the effective management of hard disk and RAM to enable efficient and fast cache access. 3 Web Traffic Characteristics 3.1 Basic Web-Based Interaction The interaction between HTTP clients and servers is illustrated in Fig.5. The amount of the time needed to retrieve a Web object when a new connection is required can be approximated by two RTTs plus the time to transmit the response. Fig.4. The ISAAC cache system structure. Fig.5. Interaction between a Web client and a server under HTTP. In the older but still widely-used HTTP version 1.0, a separate TCP connection needs to be opened and closed for each object embedded in a Web page. This increases the user-perceived access latency as well as the communication overhead. To address this problem, the current HTTP protocol HTTP version 1.1 utilizes the so-called persistent connections to eliminate the needs of establishing multiple TCP connections. Under persistent connections, TCP connections are kept open for a Web server to serve multiple objects to the requesting client until a time-out expires. According to a most recent analysis, however, the utilization of HTTP 1.1 does not result in notable differences in HTTP traffic [31]. There is strong evidence to support the observation that the behavior of Web users strongly affects the nature of TCP connections. In particular, it is shown that the time between two page visits is critical to determine if existing TCP connections can be re-utilized or if new connections have to be

6 118 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 opened. It is unclear how HTTP 1.1, which has replaced HTTP 1.0 as the dominant Web protocol, will affect Web caching research. 3.2 Web Objects and Traffic Patterns An in-depth understanding of Web traffic patterns can substantially contribute to the development of Web caching technology [31 36]. For instance, temporal locality and spatial locality of reference within user communities are of particular relevance to Web caching. Temporal locality implies that recently accessed objects are more likely to be referenced in the near future. Spatial locality refers to the property that objects neighboring an object accessed in the past are likely to be accessed in the future. Strong empirical evidence exists in support of these localities in the context of Web traffic [32;37;38]. These localities in part help explain the success of Web caching in improving Web QoS and point out new fruitful research directions [38]. Researchers have also identified the characteristics of Web documents that can guide the design of Web caching systems and help determine specific caching policies such as prefetching and replacement. Two main characteristics of Web documents relevant to caching are: lifetime and modification patterns [39;40]. For instance, more accurate Time- To-Live (TTL) estimation of Web objects can directly lead to more efficient Web object coherency operations. The modification patterns also have major influence on the design of caching prefetching techniques. Historical information concerning the Web objects and their hosting servers can be used to derive useful estimates, possibly in an adaptive caching framework [30]. 4 Prefetching, Replacement, and Coherency In this section, we discuss the main policies that govern various aspects of the inner working of a single proxy cache system. Note that most of these policies are applicable to caches located in network locations other than proxy servers (such as client browser and surrogate servers) as well. 4.1 Prefetching Policies Cache hit rate, or simply hit rate, is calculated as the number of the user-requested Web objects that are answered by the cache divided by the total number of the requested objects. Achieving the highest possible cache hit rate under the given resource constraints such as storage capacity is one of the primary goals of all Web caching systems. An ideal caching system should achieve a hit rate close to 100%. However, recent results suggest that the highest cache hit rate that can be achieved by the best caching system is usually no more than 40% to 50% [20]. One of the main reasons for this lessthan-ideal hit rate is that Web users are constantly seeking for new information. Caching only old Web objects that have been visited in the past limits the caching system's capability to serve user requests. One way to improve hit ratio is to anticipate future document requests and prefetch these documents in a local cache before they are actually requested by a user [41 45]. Such prefetching activities can be performed by the cache proxy server when the network load is low and the Web servers are relatively idle. Prefetching delivers an additional benefit in off-line browsing where new documents that the user will be most probably interested in are automatically prefetched on the local machine. Current Web prefetching mechanisms can be divided into three types: proxy-initiated policies [41;46], server-initiated policies [44] and hybrid-prefetching policies [42;43;45;47;48]. Proxy-Initiated Policies. Two main proxy-initiated prefetching policies are: the Client-Side- Prefetching policy [41] and Prediction-by-Partial- Matching (PPM) policy [46]. In the Client-Side- Prefetching policy, what and when to prefetch are decided by the client. This policy is simple and easy to implement but does not provide any prediction for future requests. The basic idea of the PPM policy is to use past Web access patterns to predict future accesses for individual users. The patterns captured are in the form of User A is likely to access URL U1 right after URL U2". The key advantage of this policy is the potentially high accuracy in predicting user future accesses. A main open issue with this policy is when to prefetch the objects that are likely to be visited. These timing decisions are particularly important in a real-time online environment with a large number of users. Server-Initiated Policies. In this type of prefetching mechanisms, the server anticipates future document requests and sends the documents to participating proxies. The Top-10" algorithm is a classical example of server-initiated policies where the server periodically compiles a list of its most popular objects and serving them to clients or proxies [44]. To make sure that these popular objects are sent only to the interested clients, the server periodically singles out a subset of active clients (that

7 Ming-Kuan Liu et al.: Web Caching: A Way to Improve Web QoS 119 have visited the server frequently) and sends objects only to them. Experiments show that serverinitiated policies can be fairly successful in serving future requests (with 60% accuracy) with small (less than 20%) increase in network traffic [43]. The success of server-initiated policies can be partially explained by the information-pooling effect: the server as the central access point has the most complete information concerning user interest in the set of Web objects residing on the server. Server-initiated policies are also faced with several challenges [6;43]. First, because of the very fact that caching is gaining acceptance, server-based usage data may not represent true user interest (e.g., caches can hide lots of user requests from the server). Second, the server needs to keep track of participating clients and proxies, incurring additional computational and communication overheads. Hybrid Prefetching Policies. Combining user access patterns from a client machine and general statistics from a server can improve caching prefetching. Yu and Kobayashi presented a general mathematical formulation and related operating policies in [49]. Their model also considers factors such as server response time and document update frequency. We now summarize the state of the art in prefetching research and point out related ongoing and future research topics. Most existing prefetching models are developed for individual caches. They assume that the local cache storage capacity is unlimited. When predicting user future requests, these models consider factors such as user-specific historical access patterns, and aggregate document popularities and access frequencies. When making prefetching decisions, additional factors such as the time needed to prefetch the documents and the network congestion are also considered. Prefetching remains an active area of study [6]. We believe that the following four topics in prefetching are most likely to produce fruitful research results and make significant impact on caching practice. In effect, some of these topics have already been actively pursued by caching researchers. First, the data mining literature provides a wide selection of models and efficient algorithms that can be used to learn potentially complex user Web access patterns [50 52]. For instance, the mining techniques for association rules can be readily applied to learn user Web behavior. We believe that research in the intersection of data mining and Web caching may lead to meaningful results and benefit caching system design and implementation in general. Second, most current prefetching research does not model system constraints such as cache storage capacity. As a result, important Web object characteristics such as size are ignored in making prefetching decisions. For instance, in the Top-10 policy, all top-ranked popular objects are fetched regardless of their size. In a caching system where local storage capacity is a major limiting factor, a better prefetching policy would consider an object's popularity and size together. Future prefetching research needs to explicitly address these system constraint issues. Third, recent research has started to look into the possibility of integrating prefetching and other aspects of Web caching such as replacement and coherency policies to achieve better overall caching performance [53]. For instance, an overzealous" prefetching policy can better serve some users' future needs but may push out many frequentlyvisited old cached contents from the cache and in turn hurt the cache's performance. This type of problem, called thrashing", can be dealt with when prefetching decisions are not made in an isolated manner. Fourth, new research is called for to develop effective prefeching policies that work in the collaborative caching context. Applying the policies developed for individual caches in collaborative caching can lead to significant waste in resource and sub-optimal performance. 4.2 Replacement Policies One key resource constraint that any caching system has to consider is the local cache storage space. When the cache is near full and new Web objects (including those being prefetched) need to be stored, some existing objects have to be evicted from the cache. Caching replacement policies govern such eviction activities. Three types of replacement policies have been developed in the caching literature: traditional replacement policies, key-based replacement policies, and cost-based replacement policies. Traditional replacement policies include the Least Recently Used (LRU) algorithm [54;55], the Least Frequently Used (LFU) algorithm [56;57], and the Pitkow/Recker policy [58]. Key-based replacement policies make eviction decisions based on a primary key decided by certain characteristics of the Web objects under examination [58;59]. Cost-based replacement policies employ a cost function to rank the Web objects in the current cache for their appropriateness to be evicted [60 62]. Traditional Replacement Policies. LRU and LFU are now widely used in computer memory and disk caching systems. LRU replaces the object that

8 120 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 was used least recently and LFU replaces the object that was used least frequently. Because of their simplicity, they have been adopted by Web caching systems as well [54;56]. The Pitkow/Recker policy is a variation of LRU. Objects are evicted in the order decided by LRU with one exception. If all the candidate objects were accessed within the same day, then the largest one is to be removed (rather than the one that was visited closest to the beginning of that day) [58]. The performance of these traditional replacement policies is relatively low since neither the relevant characteristics of the Web objects (such as size) nor the general environment indicators (such as network transmission speed) are considered in making eviction decisions. Key-Based Replacement Policies. This type of policy uses a primary key to decide which object to evict. Ties are broken using additional keys. A widely-used primary key is the object size. One popular approach evicts the largest object [58]. The LRU-MIN policy uses a more sophisticated method. If there are objects in the cache whose size exceeds a threshold S, LRU-MIN evicts the least recently used objects. If the size of all the remaining objects is smaller than S, LRU-MIN re-evaluates the objects using a new threshold set to S=2. This process continues until enough space is reclaimed for new objects. Cost-Based Replacement Policies. A cost-based replacement policy relies on a cost function to decide which object should be evicted. This cost function can take into consideration the characteristics of an object such as last access time, usage frequency, and HTTP headers. It can also depend on general environmental parameters such as network traffic situation, the current time at which eviction has to be performed, and so on. Several cost-based policies are derived from the Greedy-Dual-Size algorithm [63], which considers in an integrated manner an object's size, its retrieval cost, and how recently it has been accessed. These policies work as follows. Based on its size and retrieval cost, an object is given an initial value when it first gets into the cache or is accessed by a user. This value then decreases incrementally when time goes by if it is not accessed by any user. The object with the least value is the candidate to be replaced. One possible improvement to these policies based on Greedy-Dual-Size is to consider the popularity of an object in addition to its size and recency in access. An important factor to consider while designing cost-based replacement policies is the overhead of computing the cost function. In some cases, high computational overhead may prevent the use of certain otherwise desirable algorithms in practice. Recall that in a cooperative caching system, access characteristics differ significantly across the levels of the caching hierarchy [64 66]. This suggests that different caching replacement policies should be used at different levels of a caching hierarchy to achieve better performance [67]. 4.3 Cache Coherence Policies As argued before, serving user requests off the cache proxy server can potentially improve Web QoS when cache hits occur for a reasonable portion of user requests. Achieving high cache hit rate, however, is only one of the challenges facing Web caching. Another major challenge is concerned with how to avoid serving stale" contents to the user when a hit occurs. This challenge is relevant to not only Web caching but also other caching applications such as distributed file systems [68]. In the context of Web caching, cached objects can quickly become stale or out-of-date when their counterparts on the original Web servers are frequently updated. The main technical goal of Web caching coherence mechanisms is to avoid or minimize the probability that stale Web objects are used to serve user requests. In other words, caching coherence mechanisms aim to provide users with the cached contents that are as fresh as possible compared with their original copy. Existing web caching coherence mechanisms can be categorized into three classes: proxy-initiated policies, server-initiated policies, and hybrid prefetching policies. Server-initiated policies mainly include Callback [69] and Piggyback Server Invalidation (PSI) [70]. Proxy-initiated policies include Poll Each Read, Poll Periodically [71], Adaptive TTL [72], and Piggyback Cache Validation [73]. Major hybrid policies are Lease [74], Volume Lease [71], and Adaptive Lease [75]. Server-Initiated Policies. We discuss two types of cache coherence policies that are initiated from the server side. The first type is the Callback algorithm. In this algorithm [69;72], Web servers keep track of which proxies are caching which objects. When an object is to be modified, the server notifies the proxies that cache this object and then waits for reply. After receiving replies from all related cache proxies, the server finishes modifying the object. The main advantage of the Callback algorithm is that it is able to maintain the strongest consistency between original Web objects and their cached copies. The main disadvantage is the computational overhead imposed on the server. Such an

9 Ming-Kuan Liu et al.: Web Caching: A Way to Improve Web QoS 121 overhead is significant when a large number of proxies are present and when the set of cached objects maintained by each proxy server is changing constantly. The second server-initiated mechanism is the Piggyback-Server-Invalidation (PSI) algorithm. The basic idea is for the server to piggyback on a reply to proxies with a list of objects that have changed since the last access by the proxy. Upon receiving such a list, the proxy invalidates cached objects on the list and extends the life cycle of other cached objects that are not on the list. The key advantage of PSI is the reduced coherence-related messaging and resulting network bandwidth savings [70]. Proxy-Initiated Policies. There are mainly four coherence policies that are initiated from the proxy side. The first one is the Poll Each Read algorithm. Following this policy, before sending a cached object to the user, a proxy contacts the server hosting this object to find out whether the object is valid. If it is not valid, the server sends the latest version of the object to the proxy which in turn updates its cache and forwards it to the user. This policy can be easily implemented and guarantees strong coherence. The main drawback is the added access latency because of the validation messages between the proxy and server for each object requested by the user. The second policy is the Poll-Periodically algorithm. This algorithm is based on Poll Each Read but assumes that cached objects remain valid for at least a certain amount of time after it is validated. This approach can obviously improve access latency compared with Poll Each Read. However, it is difficult to choose the appropriate timeout period [71]. The third policy is the Piggyback Cache Validation (PCV) algorithm [73]. In its simplest form, whenever a proxy cache needs to send a message to a server, it piggybacks a list of cached objects from that server, for which the expiration time is unknown and the heuristically determined TTL has expired. The server handles the request and indicates which cached objects on the list are now stale and thus need to be updated. Requests for cached objects that have not recently been validated cause an If-Modified-Since (IMS) Get request to be sent to the server. The main advantage of PCV is that it can reduce access latency by minimizing the number of the network connections that need to be established between the server and the proxy. The disadvantage of this algorithm mainly lies in the increased size of the regular request messages due to piggybacking. The computational overhead for the proxy cache is slightly increased, as it must maintain a list of cached objects on a per server basis. The additional overhead for the server is that it must validate the piggybacked object list in addition to processing regular requests. The fourth and also the last proxy-initiated coherence policy is the Adaptive-TTL algorithm. The basic TTL policy maintains cache consistency through the use of the time-to-live (TTL) attribute of a cached object. A cached object is considered valid until its TTL expires. The problem with the basic TTL policy is that it is difficult to estimate TTL parameters. The adaptive TTL policy handles this problem by adjusting the TTL attribute of an object based on observations of its life cycles [72]. This approach takes advantage of the fact that object life cycle distributions tend to be bi-modal: if a file has not been modified for a long time, it tends to stay unchanged. Positive empirical results have demonstrated the usefulness of Adaptive-TTL [72]. Hybrid Prefetching Policies. Hybrid prefetching policies refer to coherence polices that actively involve both Web servers and proxy caches. We discuss three major hybrid policies. The first one is the Lease-based coherence algorithm motivated to improve system scalability and fault-tolerance [71;74]. Under this policy, if a proxy or network failure prevents a server from invalidating cached copies of a Web object that needs to be changed, the server needs only to wait until the lease" expires before modifying the object. A main challenge that Leasebased policies need to address is how to determine the appropriate lease length for a given object. The second policy is the Volumes-Lease algorithm [71]. The main goal of this algorithm is to exploit spatial locality to amortize overheads across multiple objects in a volume. This algorithm uses a combination of object leases and volume leases. Object leases are associated with individual data objects, while volume leases are associated with a collection of related objects on the same server. This policy has been shown to perform well in a WAN environment [71]. The fourth and the last hybrid prefetching policy is the Adaptive-Leases algorithm. This algorithm determines the optimal lease durations based on a number of factors including the need to maintain strong consistency, network connection costs, object update frequencies, etc. Adaptive-Leases is able to achieve significant improvement over other approaches while maintaining a modest and manageable computational overhead [75]. 5 Inter-Cache Cooperation and Routing A Web caching system based on a single cache poses many limitations [76]. It does not scale well and

10 122 J. Comput. Sci. & Technol., Mar. 2004, Vol.19, No.2 the single proxy cache can easily become the performance and communication bottleneck. On the contrary, cooperative proxy caching systems, which consist of a set of cache proxy servers serving a group of users, can overcome these limitations. As argued in Section 2, the effectiveness of cooperative caching is largely determined by the cooperative and routing strategy employed to coordinate inter-cache cooperation. This section discusses four main types of cooperation and routing strategies developed in the literature: broadcast queries, hierarchical caching, URL hashing, and directory-based routing table [6]. Broadcast Queries Policy. The Broadcast Queries Policy works as follows. When a cache proxy receives a client request, it first checks whether the requested object can be served from its local cache. If a valid cached copy is not found in the local cache, this cache proxy will broadcast the request to all participating proxies asking for their help [17]. The main advantage of this policy is its flexibility that the effects of a proxy joining or departing the cooperative caching network are localized to its immediate neighbors. Two main drawbacks of this policy are: (a) a cache proxy has to wait for the last response from its neighbors before concluding that none of them has the requested documents and sending the request to the next level of the caching hierarchy; (b) broadcast queries result in extra network traffic and impose computational overhead on participating cache proxies. Hierarchical Caching Policy. In this policy, when a local miss occurs, the cache proxy simply forwards the missed request to its parent in the caching hierarchy without attempting to query sibling cache proxies. Besides obvious savings in communication overhead compared with the Broadcast Queries Policy, this policy has the additional benefit of allowing different organizations to utilize a common highlevel parent proxy without sharing cached contents at the lower levels. The main disadvantages of the hierarchical caching policy are: (a) cache proxies closing to the root need to store a large number of objects and can become performance bottlenecks; and (b) the entire hierarchy has to be traversed before it can be concluded that a requested object has to be fetched directly from the original Web server. Directory-Based Cache Cooperation. In this approach, the location of cached objects is explicitly maintained by a directory server [23]. When a miss occurs at a certain cache proxy, it will query the directory server to find out which proxy can provide the requested object. Upon receiving a response from the directory server, this proxy will either contact another cache or visit the original Web server directly. To ensure that the directory server always provides fresh location (meta) information, any cache proxy that is caching new objects or dropping old contents needs to send updates to the directory server. Since the communications between the directory server and caches do not contain the actual content, messaging overhead associated with this approach is not significant. Another advantage of this approach is that it promotes loose and flexible connections between cache proxies. Individual proxies can be added to and removed from the system without the knowledge of other cache proxies. The main disadvantage of this policy is that the directory server may become a single point of failure. Hashing Function-Based Cache Routing. In a hashing function-based approach, all Web clients store a common hash function that can be used to map any given URL to a hash space. The hash space is partitioned and each set in the partition is associated with one of the sibling caches. When a client needs to access a Web object, it first hashes the URL of the object and then requests it from the sibling cache whose set contains the hash value. If this cache cannot satisfy the request, it retrieves the object from the original server, places a copy in its cache, and forwards the object to the client. Several approaches have been reported in the literature that belong to this general hashing-based framework, including the Cache Array Routing Protocol [77] and Consistent Hashing algorithm [25]. The advantages of hashing-based approaches are three-fold. First, they are scalable with respect to the number of user requests. More cache proxies can be easily added and the hash space can be repartitioned. Second, their computational and communication overheads are relatively low. Third, they lead to efficient use of cache storage space because no duplicated copies of the same document will be made on different cache proxies. The main disadvantage of these hashingbased approaches is that the same cache proxy must process all requests for a given URL, no matter where the requests come from. This can lead to potential performance degradation. 6 Performance Metrics and Evaluation of Web Caching Systems 6.1 Performance Metrics for Web Caching Systems Recent years have seen the rapid development of Web caching technology. Many Web caching architectures and policies that govern various operational aspects of caching systems have been devel-

11 Ming-Kuan Liu et al.: Web Caching: A Way to Improve Web QoS 123 oped. Often, these architectures and policies offer competitive advantages in some aspects of caching but deliver sub-satisfactory performance in other aspects. To assess the performance of a Web caching system and understand the tradeoffs offered by various caching approaches, it is essential for caching researchers and developers to establish a comprehensive performance metric. This section provides a summary of the performance measures commonly used to evaluate the performance and effectiveness of Web caching systems [67;78 81]. These measures are divided into seven groups. Web Traffic Patterns. Web traffic-related measurements characterize the environment in which a caching system is operating. They provide useful input to the overall design of a caching system (e.g., storage capacity planning and caching network topology determination). Examples of such measurements are: the transfer size distribution which describes the distribution of the total number of bytes transferred in the network, the file size distribution which describes the distribution of the size of cached objects, and the proxy traffic intensity which indicates the number of client requests received by a proxy during a given time interval. Aggregate Performance. This group of measures shows the aggregate performance of a caching system. By aggregate", we mean that these measures are obtained when treating the given caching system as a whole rather than analyzing its internal components. Two main performance indices in this group are (a) the processing capacity and speed of a cache proxy and (b) the request response time. Hit Analysis. There are many performance indices in this group: hit rate, byte hit rate (discussed in Section 4), disk hits (the number of the requests resolved by reading the content from the disk), memory hits (the number of the requests resolved by reading the content from the cache's internal memory), negative hits (the number of the uncached objects being requested), and so on. Inbound Traffic. A cache proxy and its clients rely on an internal network for their communications. The QoS of this inbound network has a major impact on response time. Two related measures are: the Client Connection Time which is the delay on the proxy from accepting connections until receiving a parseable HTTP request; and the Proxy Reply Time which is the time it takes to send a reply to a client after receiving the document from the cache or another server. Disk Storage Subsystem. Since most hits need to be served from the local disk attached to the cache proxy, the request response time depends directly on the performance of the local disk storage subsystem. Three indices are commonly used in this group of measures: the Disk Traffic Intensity measuring the number of swap requests per second; the Concurrent Disk Requests describing the number of concurrent swap requests; and the Disk Response Time indicating the total time taken to swap in/out a document from/into the disk cache. Network Utilization. This group of performance indices measures the average network bandwidth consumption, the latency of the network transmission, the number of HOPs along the network path, among others. Outbound Traffic. When a cache miss occurs, communications between the cache proxy and the original Web servers need to take place which go beyond the internal network linking the proxy and clients. The related indices include: the Proxy Connect Time which is the time taken to send an HTTP request to an original server or other proxies, and the Server Reply Time which is the time taken to receive a reply from an original server. 6.2 Evaluation of Web Caching Systems The web caching community has developed several standard benchmarking tools to evaluate the performance of cache systems. Some of the tools are self-contained and can generate all HTTP requests and responses internally. Others rely on trace log files for requests and live Web servers for responses. A benchmark tool that uses real URLs obtained from actual Web servers is easy to implement but likely to give inconsistent and irreproducible results. A self-contained benchmark tool is much more complicated to develop but has the advantage of being configurable and reproducible. The most commonly utilized web-caching benchmark tools are described below. Web Polygraph. Web Polygraph is a free benchmarking tool for caching proxies, original server accelerators, L4/7 switches, content filters, and other Web intermediaries ( It was developed by NLANR and can simulate Web clients and servers as well as generate workloads to mimic typical Web accesses. Blast. The Blast software package was developed by Jens-S ( Cache/Development/blast.html). It replays trace log files for the Web requests. Blast launches a number of child processes in parallel, each handling one request at a time. It also includes a program that simulates a Web server. This simulated server sup-