Kingston University London In network content caching contributing to the Future Internet Architecture Dissertation submitted for the Degree of Master of Science in Networking and Data Communications By DOUNIS CHARALAMPOS SUPERVISOR PAPADAKIS ANDREAS KINGSTON UNIVERSITY, FACULTY OF SCIENCE, ENGINEERING AND COMPUTING ΤEI OF PIRAEUS, DEPARTMENTS OF ELECTRONICS AND AUTOMATION MARCH 2013
Table of Contents Table of Figures... 5 Abstract... 6 1. Introduction... 8 1.1. Background Information... Error! Bookmark not defined. 1.2. Thesis Scope and research goals... 9 1.3. Technologies and Resources... Error! Bookmark not defined. 2. Content Distribution Methods... 11 2.1. Clustering and Mirroring... 11 2.2. Web Caching... 12 2.2.1. Hierarchical Caching... 15 2.3. Content Distribution Networks... 17 2.3.1. Distribution services and content delivery... 17 2.3.2. Content Distribution Network Benefits... 18 2.3.3. CDN Functionality and components... 19 2.3.4. User Redirection Mechanisms... 20 2.4. Policy rules for content delivery networks... 22 2.4.1. Introduction to network policies... 22 2.4.2. Consistency Models... 23 2.4.3. Alternative CDN policies... 23 2.5. Edge Services... 24 2.6. Application Content Distribution Networks... 25 2.6.1. Introduction to ACDN... 25 2.6.2. ACDN Requirements... 26 3. Content Distribution Business Models... 27
3.1. Business Models... 27 3.2. Content Distribution Business Chain... 28 3.3. Content Distribution Business Models... 29 3.3.1. Content-centric model... 30 3.3.2. Access-centric model... 31 3.3.3. Alternative business models... 33 3.3.4. Peer-To-Peer (P2P) model... 34 3.3.5. Pay-per-view model... 36 4. The Cloud... 37 4.1. Cloud computing features... 37 4.2. Cloud Computing Services... 38 4.3. Cloud Deployment Models... 39 4.4. The Cloud and CDN... 41 4.4.1. Examples of Cloud CDN services... 43 5. Conclusions... 45 References... 47
Table of Figures Figure 1 Proxy server (source: Wikipedia.org)... 13 Figure 2 Reverse proxy (source: Wikipedia.org)... 14 Figure 3 Cache hierarchy... 15 Figure 4 Content distribution chain... 28 Figure 5 Content-Centric CDN Model... 31 Figure 6 Access-centric model... 32 Figure 7 Content Bridge model... 34 Figure 8 P2P CDN model... 36 Figure 9 The stack of the Cloud... 39 Figure 10 Cloud deployment models... 40 Figure 11 CDN cloud model... 43
Abstract CDN architectures perform content replication caching (mainly at the edges of the network). Content Distribution Networks consist of groups of intermediate servers (Proxy-servers) placed in key positions on the Internet. The key idea is so that ensure that content required by an application is retrieved form a "nearby" server. These networks are essentially an intermediate level between the servers and customers, a middleware that uses caching techniques, load balancing and replication of information. The Internet of the Future is tightly coupled with the Cloud. The Cloud comprises of distributed data centers, which offer economies of scale, and cheaper computing resources. Cloud computing provides computation, software, data access, and storage resources without requiring cloud users to know the location and other details of the computing infrastructure. It is reasonable to believe that the Cloud may have an important role in multimedia content delivery, since in-network content replication schemes have to be implemented. In the current thesis present a comparative analysis of business models in the area of content distribution, namely models used by content distributors, who are responsible for the operation of the CDN. The models are categorized based on the CDN client, that can be either the content provider or the Internet Service Provider (ISP). Typically, the CDN customer selects the content to be copied to the CDN nodes and is charged by the content distributor accordingly. Then the content distributor pays a share of the revenue to other Internet business entities that contribute to the delivery of content, such as data centers and backbone providers. We then focus on the economic dimension of the problem of allocating the storage of a cache to multiple nodes. We will present and examine the financial mechanisms that have been proposed to solve the problem of allocating the space of a cache. These models are compared to existing business models of Cloud providers.
First, we examine the case of hierarchical caching in typical CDNs and elaborate on the effects caused by the requests and content distribution between caches. Secondly, the analysis is extended to include Cloud providers that may be used as intermediates. The Cloud providers may be included in the model in two ways: By providing Software as a service (SaaS). SaaS is a software/application delivery model in which software and associated data are centrally hosted on the cloud. SaaS is typically accessed by users using a thin client via a web browser. In this case, the Cloud can be used is an intermediate between the CDN and the end user. By providing Infrastructure as a service (IaaS). In the most basic cloud-service model, providers of IaaS offer computers - physical or (more often) virtual machines - and other resources. In that case, content distributors may deploy their applications as cloud users and install operating-system images and their application software on the cloud infrastructure. The goal is to study how robust and effective is the solution of using the Cloud for content replication and distribution. For the aforementioned aims and goals of this Thesis, the research that we will conduct will be hopefully greatly beneficiary to network and content providers.
1. Introduction In recent years the popularity of the Internet and especially the World Wide Web as sources of information and entertainment for people all over the world has grown exponentially. The phenomenal growth is largely due both to the ease of Internet access and use of browsers, and in addition to the delivery of more attractive content such as audio and video. As end users gain faster access to the Internet from their home via DSL and fiber to the home technologies, the demand for high quality content will grow steadily in coming years. Also, through the new advanced wireless devices such as mobile phones and handheld computers (PDAs), Internet access is possible from anywhere and at any time. This ever increasing demand for content, however, places a heavy burden on the existing Internet infrastructure. The servers and network connections must be able at any moment to meet the demand, which can be highly variable and unpredictable. The biggest problem is hot-spots. These are created when some content becomes extremely highly popular, typically for a limited period. Examples of such hot spots are the broadcast of major sporting and other events, or the public release of popular software via the Internet. All these incidents produce a heavy burden on the source server's content. Furthermore, an intractable problem is the accurate prediction of the required network capacity. In addition, upgrading the servers or network connections to satisfy the demand is not always a feasible solution as millions of new users every time wish to access the same content. The Internet is a complex grid of interconnected networks. Congestion and failures (ie packet drops) may occur in many places, such as Error! Reference source not found.: The "first mile that connects the server in a data center to the Internet. The Backbone network of a provider. Peering points between the providers of network services. The "last mile that connects the user to the internet.
The World Web is based on a client-server architecture. The content is hosted on Web servers, while the user requests are served using HTTP (HyperText Transfer Protocol) Error! Reference source not found.. Users use browsers as the client software to communicate with the server software. Objects are uniquely identified on the Web by the (URLs - Uniform Resource Locators), which specify the server name that is the subject and also the location in the filesystem where the object is located. Websites are written in HTML (HyperText Markup Language) and may contain text, pictures, other multimedia content (audio/video) and links to other content. The problem of content distribution stems from the shortcomings of client-server architecture that fails to handle the increased demands of the multimedia content that requires high bandwidth. 1.1. Thesis Scope and research goals In the context of the Thesis we will investigate in network content replication scenarios, evaluating their performance under variable traffic conditions and user requests schemes. Different network topologies will be investigated, under various network load and network link conditions. The goal is to study how robust and effective is the solution of in-network content replication, and as a result examine their feasibility in the Internet. For the aforementioned aims and goals of this Thesis, the research that we will conduct will be hopefully greatly beneficiary to network and content providers. This thesis will involve the following: A discussion about Content Delivery Networks (CDN). CDN architectures perform content replication caching (mainly at the edges of the network). Content Distribution Networks consist of groups of intermediate servers (Proxy-servers) placed in key positions on the Internet. The key idea is so that ensure that content required by an application is retrieved form a "nearby" server. These networks are essentially an intermediate level between the servers and customers, a middleware that uses caching techniques, load balancing and replication of information. Moreover, the new tendency in Internet Services, the Cloud, will be presented. The Internet of the Future is tightly coupled with the Cloud. The Cloud comprises of
distributed data centers, which offer economies of scale, and cheaper computing resources. Cloud computing provides computation, software, data access, and storage resources without requiring cloud users to know the location and other details of the computing infrastructure. It is reasonable to believe that the Cloud may have an important role in multimedia content delivery, since in-network content replication schemes have to be implemented We will then proceed with the presentation and comparative analysis of business models in the area of content distribution, namely models used by content distributors, who are responsible for the operation of the CDN. The models are categorized based on the CDN client, that can be either the content provider or the Internet Service Provider (ISP). First, we will examine the case of hierarchical caching in typical CDNs and elaborate on the effects caused by the requests and content distribution between caches. Secondly, the analysis can be extended to include Cloud providers that may be used as intermediates. The goal is to study how robust and effective is the solution of using the Cloud for content replication and distribution. For the aforementioned aims and goals of this Thesis, the research that we will conduct will be hopefully greatly beneficiary to network and content providers. The structure of the current thesis is as follows: In chapter 2 we will present a thorough survey on current CDN architectures. In chapter 3, will proceed with the presentation and comparative analysis of business models in the area of content distribution. In chapter 4, cloud Computing will be presented, and we will examine how it can be used in content distribution. Finally, chapter 5 will conclude the thesis.
2. Content Distribution Methods Researchers have been studying several approaches to the delivery of multimedia content in a way that offers both the desired reliability and scalability Error! Reference source not found.. 2.1. Clustering and Mirroring One of the approaches that can help solve the problems of fault tolerance and scalability is local clustering of servers in a data center. However, if the network connection of the data center fails, then the entire server complex is inaccessible to users. To solve the problem of single point failure, content delivery sites may perform content distribution, such as mirroring (the usage of back-end servers that host the same replicated content at several different sites) and multihoming (the use of multiple ISPs for connecting to the Internet). In the content mirroring approach, the content of the origin server is copied entirely to another server, called mirror or replica. The contents of a mirror must be identical to the stored content in the origin server so users may access any of the two machines interchangeably. Mirroring and multihoming are popular methods for support sites with strict requirements on reliability and scalability. However, these methods do not solve all the problems of connectivity and often introduce additional difficulties in the content distribution system, such as the following: Escalating mirroring to thousands of servers is a tedious process and has large demands in administration and management. In the case of multihoming, routing protocols must converge fast enough to support continuous content delivery in case of failure of the initial links.
Mirroring requires continuous synchronization between the mirrors and the origin server. Each of these solutions introduce significant economic costs, which may exceed multiple times the initial infrastructure costs and operating costs of a website. In the case of clustering, there must be several servers at each location to process the workload at peak times (which can be orders of magnitude higher than the average load). Multihoming is also expensive, since the backup connections are not utilized in normal conditions. The main disadvantage of mirroring is how to inform users about the existence of multiple mirrors. Usually, the origin server maintains a list of available mirrors, which is supplied to the user, who then may choose an available mirror whenever they visit the site. Users, however, are not always aware of the identity of these mirrors and usually do not know which is the most close or less loaded server. Also, the management of multiple mirrors is demanding since it may not be possible to become fully automated. 2.2. Web Caching Caching is a technique similar to mirroring, for copying content to multiple locations closer to the consumers of content, but with two key differences. A mirror typically hosts all the content of the content provider (a website for example),whereas in caching content is copied to the cache on demand. Furthermore, caching is a more dynamic than mirroring, as it operates object level and is based on a dynamic process of replacement of the cache contents in response to changes in demand. When a requested object is not in the cache, the proxy server is forced to load the content from the origin server, store it locally and deliver it to the user. The use of a proxy cache requires configuring the web browser of the user, or setting up a transparent proxy Error! Reference source not found.error! Reference source not found. which intercepts user requests and redirects them to the cache.
Caching exploits the phenomenon of temporal locality observed in content delivery applications in the Internet. Specifically, requests for an object tend to be concentrated in a relatively narrow time period. After that time period, the popularity of the object deteriorates over time. So, by copying the object in a cache, all future requests will be served immediately until the object is replaced by another object. There are several places online where a cache may be placed: the browser cache, an ISP network, and near the origin server. So we have the following levels of hierarchy in the placement of caches on the Internet: Browser Caches: Located at the lower end of the hierarchy of caches on the Internet. A user-specified portion of the hard disk of the computer used to store objects already "downloaded" to the machine, so that the next visit to a website may be served by its own local cache. It is widely used for web images since they are usually the largest objects in an HTML page. Proxy Caches: Proxies operate under the same principle, but on a larger scale, serving hundreds or thousands of users. Proxy caches are a type of shared cache as opposed to individual browser caches. Most proxy caches are installed in large corporations and ISPs to reduce bandwidth consumption, as there are many common requests for the same objects by multiple users. Schematically, a proxy cache is illustrated in figure 1. Figure 1 Proxy server (source: Wikipedia.org) Reverse proxy caches: Placed close to the source server and operate to its advantage. This way a content provider improved its content availability, but all benefits are limited this provider, as opposed to proxy caches that accommodate
different content providers. In figure 2, a reverse proxy is illustrated taking requests from the Internet and forwarding them to servers in an internal network. Those making requests connect to the proxy and may not be aware of the internal network. Figure 2 Reverse proxy (source: Wikipedia.org) The benefit of using caches to distribute Internet content is threefold. Specifically, the caches can help reduce three sizes: Reduction of the number of requests for content served by content source servers, with a consequent reduction of the CPU burden and costs for the required infrastructure of the site. Reduction of network traffic: As each content item is taken only once from the origin server and subsequently it is retrieved from the cache, the amount of bandwidth used is reduced. In particular, the amount packets to the backbone network due to requests for the cached content is reduced, while shifting the network traffic in the region (usually as close to the users as possible) where the caches reside. Delay reduction of the response to the end user: If the user request is served by the cache that is closest to the customer rather than the origin server, less time is required for retrieving a content item. Typically the performance of a cache is estimated using the following metrics: Hit Rate: The rate of requests served by the cache, divided by the total number of user requests
Byte Hit Rate: Same as above, only this time network traffic is estimated rather that requests. 2.2.1. Hierarchical Caching Although proxy caches are useful and efficient for serving users of an organization or an ISP, they are suitable for covering only a relatively small user population. The hierarchies of caches try to solve scalability problems, by enabling intercommunication between different levels of caches Error! Reference source not found.. In hierarchical caching, caches create a mesh and are logically organized in a hierarchy. Each node has a parent, siblings (nodes at the same level) and children. The term "adjacent" refers to the parent or fraternal nodes located a cache-hop away. The protocol used to communicate with caches in a hierarchy is the Internet Cache Protocol (ICP) Error! Reference source not found.. Figure 3 Cache hierarchy
There are two main objectives in the design of hierarchical caches. The first is to ensure that popular items are stored at lower levels of the hierarchy, while the less popular items at higher levels. The second objective is to ensure that when there is a failure in a cache, it is possible to serve the request using an adjacent cache, minimizing the possibility to access the source server. These two objectives reduce the delay perceived by the user and the overall network traffic. ICP in hierarchical caching works as follows: Suppose there is an end user served by a cache at the organization level, requesting for an object. The user request is sent to the local cache. When the cache receives the request, it checks if it already has the requested object. If it does, content delivery is achieved immediately. In case the object is not in the local cache, the cache broadcasts an ICP request to sibling nodes. If one or more siblings have the object, then it is retrieved from the sibling with the minimum delay recorded, it is sent to the user and stored in the cache. If no sibling node has the object, the request is forwarded either to the parent cache or directly to the origin server. Forwarding the request to a parent simply repeats the above process. If the request fails on every level of the hierarchy, then the object is retrieved from the source server, and then passed downwards in the hierarchy to the leaf node that created the initial request. All nodes in the tree of the request update their cache with the requested object. A main problem with the hierarchy of caches is that they may introduce a significant additional delay to requests. If a client asks a unpopular object that is not stored in any cache, the request must still be forwarded through the hierarchy to higher levels before the higher level cache sends the request to the source server. Also there is no guarantee that the caches higher in the hierarchy are actually closest to the source server. This problem is usually solved by filtering the requests sent to the top levels of the hierarchy. For example, a national-wide cache may accept applications only for objects whose country of origin (as distinguished from the URL) is abroad. Moreover, the usage of caching (either in stand-alone proxies or in hierarchies) has been associated with a series of problems Error! Reference source not found.. The main one is that the content provider loses control over its content once it is retrieved from the source server and placed in a cache. Thus, content providers often prefer to
characterize the content as non-cacheable so as to prevent it from being copied to intermediate caches between the origin server and the client. Moreover, dynamic content providers avoid caching due to the risk of delivery to end user content not updated with the latest changes made to the origin server. There are some simple mechanisms to inform caches on changes in the content, like header "Expires" or the control mechanisms of the HTTP/1.1 cache but have not received widespread development. Also, simple caching leads to loss of business knowledge about the visitors of a content provider website. The accuracy of reports about visitor access and navigation to the site has a crucial role in marketing and sales promotion. Caching can result in loss of statistical accuracy since only a small portion of user requests are delivered to the origin server. In addition, a variety of e-marketing techniques are based on personalization of content using cookies. Google for example uses cookies in order to display user-specific advertisements Error! Reference source not found.. Finally, caching services may create new challenges to security, such as breaches of confidentiality, integrity and authentication for sensitive data such as medical records when they are not under the direct control of their providers. 2.3. Content Distribution Networks 2.3.1. Distribution services and content delivery Content distribution and delivery services provide a higher level of service that complements and extends the Internet by proactively putting premium content as near as possible to the end users and by forwarding each user request to the best available server. Trying to make a distinction between the terms of content distribution and content delivery, we could define content delivery as the service provided directly to end users, while the distribution of content as the transmission of content from a central server to multiple regional servers installed at the ends of network.
The distribution and content delivery networks have emerged as an added value over the existing Internet infrastructure, offering new possibilities for processing content flows through the identification of content type, the routing of requests to the optimal server and the dynamic creation of content. The goal of the research community was to create content-aware networking, where the network elements network elements have the necessary intelligence to recognize the specific content requested, allowing this way optimal request routing and content delivery content Error! Reference source not found.error! Reference source not found.. A Content Distribution Network is a collection of intermediate caching servers that remove workload from the source servers by delivering web content on their behalf. The servers belonging to a CDN can be placed in the same physical location with the source server, or in different locations in the network, close to the end user. There are two types of CDN depending on the percentage of the contents of the source server that is copied to CDN caches. If all the contents of the origin are copied then the CDN uses full site replication scheme, but when only a part of the contents of the origin web site is copied the CDN uses a partial site replication scheme. In the partial site replication scheme mostly static and large-size objects such as images, graphics, etc. are copied to the CDN caches. 2.3.2. Content Distribution Network Benefits The main benefits from using the Content Distribution Networks in comparison with traditional best-effort transmission services can be grouped into the following categories: Increasing the bandwidth: CDNs preserve network bandwidth since requests are served by local cache servers. Buying and selling bandwidth: For ISPs and network service providers to shift traffic from backbone to the local loop is a critical success factor. With local storage and playback of content, ISPs can use local bandwidth, which has a lower cost than the bandwidth of backbone network and effectively manage the cost of content delivery with high bandwidth requirements.
Improve efficiency: By caching popular content closer to the end user delay and jitter are reduced. Flash-crowds: When web pages or event broadcasts are simultaneously accessed by large crowds, proactive replication and caching of content in regional caches distributes the workload across the Internet and prevents crowding in the origin servers. Of course, CDN quality of service depends on the size of the CDN, i.e. the number of sites where caches are installed by the service provider Error! Reference source not found.. The existence of more regional caches in the CDN implies a smaller average distance from the end user to the cache. Since CDNs allow access to information to be made locally rather than through backbone networks, they are in direct competition with Backbone Service Providers. This competition is strongly influenced by the relative prices of storage and bandwidth. Relatively larger prices in bandwidth in comparison with disk storage prices favor the usage of CDNs. 2.3.3. CDN Functionality and components The major functions of a CDN are: Copying, distributing, storing content on cache servers at the edge of the network, ensuring awareness of the data, and redirecting users to the most suitable servers. In detail: The first step is to copy, distribute and store content on local servers. Content delivery systems update, store and deliver to end users copies of the original content. There are several ways to update the content. Some systems forward content proactively to cache servers. This procedure is controlled by a central system that is aware of the CDN topology. Alternatively, the central system directs requests to specific servers and they recover the content on demand. Finally, the CDNs route incoming requests to the appropriate cache servers. The routing optimizes access times and reduces the cost of content delivery.
Mechanisms used for routing requests HTTP, IP, or DNS redirection. HTTP allows servers to redirect a client request to a different location. Anycast is a network addressing and routing methodology in which datagrams from a single sender are routed to the topologically nearest node in a group of potential receivers all identified by the same destination address [1][63]. DNS redirection and HTTP redirection (URL Rewriting) are the easiest and most popular methods for implementing load balancing. DNs redirection is completely transparent to the user, as opposed to URL Rewriting. Finally, a rather new method for redirecting user requests that is widely used in Mobile telephony is location-based service (LBS). LBS include services to identify a location of a person or object, such as discovering the nearest banking cash machine or the whereabouts of a friend or employee [64][65]. To perform these functions effectively an integrated CDN must include the following fundamental components Error! Reference source not found.error! Reference source not found.: A content distribution and management mechanism, which distributes content to nodes at the edge of the network. The central management determines the provision of content and policy settings for all nodes in the CDN and is responsible for synchronizing the content between the origin server and the cache servers. A content routing mechanism, which redirects user requests to the most suitable nodes of the CDN, based on measurements that include real-time delay, the network topology, the load of servers and policies. A content switching mechanism performs load balancing. The switching elements determine how redirect user request balancing availability of content and servers workload. A content edge delivery mechanism for delivering content from the network edge to the end user. Intelligent network services that include functions such as quality of service (QOS), virtual private networks (VPNs) and multicasting.
A management framework that allows providers of distribution services to monitor and control procedures as required. 2.3.4. User Redirection Mechanisms One of the main problems in a distributed content delivery network is to ensure that end-users will be able to retrieve the copied content from a regional cache. This is achieved with the usage of client redirection mechanisms that redirect user requests to the most suitable/optimal server in the network. The optimality is a policy decision that is mainly based on the proximity of the replica (the proxy server) to the client. Other criteria that may be used are the workload of the servers and the network congestion. There are several ways to direct the client to a replica: manual selection of the replica by the user, HTTP Redirection, DNS redirection which is the most widely used method, and Anycast. Using DNS, the DNS server is the source server redirects user requests matching the nominal address of the source server to the IP address of a server content CDN. This matching is carried out based on factors such as the availability of resources and the network status. Anycast is a network addressing and routing methodology in which datagrams from a single sender are routed to the topologically nearest node in a group of potential receivers, though it may be sent to several nodes, all identified by the same destination address. On the Internet, anycast is usually implemented by using BGP to simultaneously announce the same destination IP address range from many different places on the Internet. As a result, packets addressed to destination addresses in this range are being routed to the "nearest" point on the net announcing the given destination IP address. Content delivery networks may use anycast for actual HTTP connections to their distribution centers, or for DNS. Because most HTTP connections to such networks request static content such as images and style sheets, they are short-lived and stateless. The general stability of routes and statelessness of connections makes anycast suitable for this application, even though it uses TCP.
Regarding DNS Redirection, there are two distinct mechanisms, depending on whether the entire website is replicated or just a part of it. In the first case full redirection is supported, whereas in the second case selective redirection is supported. When full redirection is used, all user requests are redirected via DNS to a replica in the CDN. The main benefit of this mechanism is that all client requests are first sent to replicas and not to the origin servers. Another advantage of this mechanism is that it dynamically adapts to the creation of new hot-spots, since all client requests are redirected to geographically dispersed replicas. When selective redirection is used, the source server converts the URLs embedded in the replicated objects (web pages) so that the host names in the URLs are resolved to IP addresses that belong to the CDN. The benefit of the mechanism of selective redirection is that it reduces capacity requirements (only a part of the data in the source server is copied to the replicas usually the most popular objects). 2.4. Policy rules for content delivery networks 2.4.1. Introduction to network policies In general, network policies is a set of rules for the administration, management, and access control to network resources. Network policies provide a way to consistently manage multiple devices that implement complex technologies from a central point. The basic rule for defining a policy is an expression of the form "If condition is valid, then do action Y". In a content distribution network, the resource that must be efficiently managed is mainly the available storage in the regional network caches. Therefore, appropriate policies [9] must be established to control the distribution of content between the various nodes of the overlay network. A policy in a content distribution network would be a statement that defines what type of software / data may be
transferred to a CDN node, and what type of software / data must be running on the origin server. With this definition of policies there are only two types of possible actions. Specifically a file / software is either storable (where a program can be executed on a CDN node or where content can be copied and stored in a CDN node) or non-storable and must be placed only on the origin server. However each storage type is accompanied by assumptions about how to preserve the consistency of data stored. That is, when an action specifies that an object can be stored must also specify the consistency model that must be used for the stored object. 2.4.2. Consistency Models Different applications have different requirements on the consistency of the data stored in caches. In the case of static web page storing, there are three different consistency models. The first model that is called strong consistency requires each access to the cached page to return the most up to date copy of the page (ie the page that is saved in the origin server). This type of consistency is usually guaranteed by the origin server that pushes all modifications to the tree of replicas. In this case, a modification must be forwarded to all network nodes before it is considered completed. A second model of consistency, the time-limited consistency, requires updating a saved page within a specified period of time. This period of time is called TTL (Time to Live). After the expiration of this period, the replica must retrieve and renew the object from the origin server. A third model of consistency, loose consistency, requires objects stored in the cache to be updated as soon as the network is able to. This can be achieved by having the origin server creating batched updates or invalidations and transmitting the batches based on a best effort policy. Such a protocol may help to protect the network from increased load and high bandwidth consumption while it is sufficient for the needs of some applications (applications with rare content renewal).
2.4.3. Alternative CDN policies Apart from the model of consistency, there are other policy decisions to be taken. It is not valid to assume that all data is storable in all locations. The CDN nodes are scattered in different locations and may be subject to restrictions that do not allow them to store specific types of content. Therefore, there must be a policy that restricts the operation of storing in specific locations. In many cases the decision to copy and store applications or content to nodes of a CDN is determined by the current network status. A node of the CDN may only be needed if there is bottleneck in the network that affect content delivery to the end use. On the other hand, in a non-congested network the utilization of the CDN node may be unnecessary. Similarly, some origin servers may require the usage of the CDN only when they are overloaded. In these cases the content provider may define a threshold that determines the request rate beyond which the copying of content to the CDN nodes is allowed. For this reason, there should be a separate mechanism for monitoring the load on the system, both at the origin server and at the nodes of the CDN. For example, a performance monitoring system can use the information on the server load and register the server as empty, moderately loaded and overloaded. Copying and storage of content or applications to the nodes of the CDN would be allowed only when the server load is considered high enough. 2.5. Edge Services The emergence of content delivery networks not only accelerates the process of content delivery on the Internet and in corporate networks, but also enables the delivery of specialized services by the network edge. These services are called edge services. The edge services exploit the infrastructure of the CDN and overlap to some extent with the services of the CDN. They can range from content adaptation based on
the profile of an independent client, to compiling the content and providing protection using virus-like software. For providing edge services, it is necessary to develop the appropriate technology and obtain the necessary intelligence to produce dynamic content pages. The traditional architecture of websites/ web servers requires the use of the same infrastructure for the creation and delivery of content to users. For static pages and static content this architecture is satisfactory. However, creating dynamic content increases the load of a website. We therefore need a way to separate the content delivery process from content creation/compilation. Towards this direction several attempts have been made in recent years, either from companies operating in the content industry or by independent organizations that create Internet Standards. The creation of the Edge Side Includes (ESI) [14] is a result of these efforts. The goal of ESI is to to solve the performance problems inherent in caching content, by accelerating the execution of dynamic web applications. ESI is a simple markup language (mark-up language), based on XML, which is intended to describe cacheable and non-cacheable components of a website. ESI is not intended to replace HTML or other languages used to create dynamic content. In contrast, it coexists with them for the separation of the static from the dynamic part of a page. The static part of a page, a template in essence, can then be stored in the cache, and the cache may only request the dynamic content from the origin server. The assembly of the page is done in the processing nodes at the network edge, where the cache is located. 2.6. Application Content Distribution Networks 2.6.1. Introduction to ACDN An Application Content Delivery Network - ACDN [10] is a CDN which improves access to dynamic content that can be stored in caches. An ACDN initially allows the installation of the application only on a single node anywhere on the
network and then copies and transfers the application where it is needed in accordance with the observed demand. Unlike a traditional CDN that servers only static content from either a local cache server or directly from the origin server, an ACDN must have the required the computing environment ( the specific application, including executable files and data) in order to process a request. The transfer and installation of the application upon request is not a practical solution to the problem. For this reason an ACDN may only distribute requests only between servers that are currently updated with a copy of the application. 2.6.2. ACDN Requirements Beyond the classic requirements of a traditional CDN, an ACDN must provide solutions to the following problems: Application distribution framework: An ACDN needs a mechanism to dynamically install a copy of the required application on a server, and maintain consistency in terms of the original copy. The latter issue is complicated due to the fact that an application consists of multiple components that may have different versions. All components of the application must be updated in order to function properly. Content Placement Algorithm: An ACDN must decide which applications to install on what servers and when. Request distribution algorithm: In addition to the load and proximity factor that are taken into account in traditional CDNs, the delivery mechanisms of a ACDN should also take into account in which nodes is the requested application available. The distribution of an application on a ACDN may be achieved by using a metafile of the application consisting of two parts: a list of all files included in the application along with the dates of last modification for each file, and an initialization script which includes all actions to be performed by the CDN node before accepting any requests. The metafile has its own URL and receives treatment similar to any other static content. Thus, using the application metafile, the problem of maintaining
the consistency of the application is reduced to the corresponding problem of maintaining consistency of an independent static object, the metafile.
3. Content Distribution Business Models 3.1. Business Models Business models are in general one of the most debated and least understood aspects of the Web. The changes brought about by the Internet in the traditional business models have been the subject of lengthy debates and recriminations [24]. A business model defines the process by which a company conducts its functions and produces revenue. The business model clarifies the way in which a company generates revenue by identifying its position in the value chain. [28] : More specifically, there are three key elements that define a business model An architecture for flows of goods, services and information, including a description of the various business players and their roles, A description of the potential benefits for the various business players, and A description of the sources of revenue. In the distribution and delivery of content in the Internet, perhaps the bestknown business model is one followed by Akamai [41], the market leader company. According to this model, one content provider pays the content distributor, a company that operates Content Distribution Networks to ensure the storage of copies of the content on servers located in data centers that are closer to end users than the origin servers. Noteworthy is that content distributors do not possess their own network infrastructure for data transfer and rent the network infrastructure of third parties, in essence creating an overlay network. Apart from this model, but there are a variety of business models which are distinguished based on the following key points: the identity of the entity that benefits from the content distribution service the way content is selected and copied to the replicas and the direction of cash flow.
3.2. Content Distribution Business Chain The business chain of content distribution in the Internet comprises of five different categories of entities [43]: Content Providers wish to put their content closer to end users. Hosting Providers operate data centers. They provide a secure and reliable infrastructure that guarantees high availability of servers and connection to a high-speed backbone network. Backbone Providers: Entities that offer high speed Internet access. They operate core networks that are interconnected at Network Access Points. Internet Access Providers - ISPs: They provide Internet connectivity to end users. Content Distributors provide content delivery services to content providers. End Users consume Internet services. Figure 4 Content distribution chain
These five groups of players comprise the content distribution chain. The purpose of this chain is to connect consumers with content providers. Despite possible overlaps that may exist between Internet access providers, hosting providers and Backbone operators, their roles as described above,are distinguishable. Schematically, the interrelationship between these entities, is illustrated in Figure 6. The arrows illustrate the direction of data flow. Notably, the content distributor has to cooperate with almost all entities in the chain. 3.3. Content Distribution Business Models Two primary business models have been developed in the area of CDNs ([25], [26]). The first CDN model focuses on meeting the needs of content providers and is known in the literature as Content-Centric CDN. In the second model, the emphasis is given on meeting the needs of ISPs, and is therefore referred to as Access - Centric CDN. The similarity in both models is that the entity that is charged requires the satisfaction of content consumers. Their differences lie in the identity of the entity that is charged for the service, the identity of the entity selecting the content to be distributed through the CDN, and the way of distribution (what percentage will be copied to regional servers, etc.). In the first model (content-centric), the content provider pays the content distributor for delivering content to its customers in order to reduce costs for purchasing bandwidth and hosting in data centers. In the second model, the access provider is charged by the content distributor for saving bandwidth and providing better and faster service to the end users. It should be noted that whereas the data flow direction is from the content providers to the end users, the flow of money is not always the same. In the contentcentric model, cash flows from content providers to content distributors whereas in the access-centric model the cash flows from Internet access providers to content distributors.
3.3.1. Content centric model This is the model that has been adopted by the majority of companies involved in the distribution of content, such as Akamai and Speedera. In this model, the profit of content distributors comes from content providers who are charged for the use of CDNs in order to facilitate the distribution of their content. The selection of content to be copied to CDN servers is done by the providers. The customers of the service, that is the content providers, shall be charged according to the amount of content delivered via the CDN, without ruling out the existence of a minimum monthly fee. In the contract signed between the content provider and the operator of CDN, the service provider guarantees uninterrupted and continuous delivery of content to end users through regional caches of the CDN. CDN servers are installed in data centers belonging to third party entities, such as Internet service providers, hosting providers or backbone network operators. Usually content distributors pay rent for the space occupied by their servers, the required power supply and network access. However, sometimes the above features are free of charge [44]. More specifically, by installing CDN servers in the premises of an ISP both parties are mutually benefited. The ISP acquires direct access to popular Internet content carried over the CDN and therefore needs less backbone bandwidth, since it is no longer required to retrieve the content from the original source. So through peering agreements, the content distributor offers free use of the its servers to the ISP and in return the ISP provides a very low price or even completely free of charge hosting to the CDN servers. Content is served to the end users through the ISP. Apart from the costs of placing CDN servers to a data center, content distributors are also charged by core network operators for interconnecting the CDN overlay. This charge is based on the amount of content moved through the core network. Moreover, in some cases, core network operators resell CDN services to their own customers through their own network, taking a share of the content distributors revenue.
Figure 5 Content-Centric CDN Model 3.3.2. Access centric model In this business model, the revenue of content distributors comes from ISPs that serve their subscribers with popular content that is copied to regional CDN caches. The selection of content is now independent of the identity of its owner, and is based on its popularity as recorded through demand. This model is based on the idea that the cost of storage is cheaper than the cost of bandwidth. By copying frequently accessible content on servers closer to end users, there is a double gain: the distribution process is accelerated and secondly ISPs lower their biggest expense, the utilization of network bandwidth. Moreover, the positive effects of the access-centric model are increased as the scale increases. More specifically, as the number of users increases, the greater the probability of common user requests served by regional servers. However, the access-
centric model proved to be not viable. The factors that contributed to this outcome are: The poor economic situation of ISPs, the result of intense competition between them pushing prices down and leaving little margin for profit. In such a business environment, the market for content delivery is deemed unnecessary, as its cost cannot be easily passed to the end user. Content-centric CDNs, provide popular content distribution with the same benefits for free, in return for placing their servers to the premises of Internet Service Providers. In some cases ISPs use their own cache servers for caching popular content. Figure 6 Access-centric model
3.3.3. Alternative business models Alternative business models that have been developed by companies-owners of network infrastructure are based on the concentration of many of the roles described earlier in one entity. The rationale behind these models is that the increased quality requirements of dynamic and streaming content today, require the participation of backbone operators in the process of content distribution. Content distribution systems that are based on a overlay network of servers without proprietary network infrastructure cannot cope with these increased requirements. Moreover, the concentration of the roles of hosting provider, backbone operator, Internet access provider and content distributor in one entity provides the advantage of service quality control from end-to-end (end-to-end service). Alliances of companies have emerged that are competing for defining open standards for content distribution, such as Content Bridge [60], propelled by Inktomi, a manufacturer of network equipment, Content Exchange promoted by Cisco and the Broadband Content Delivery Forum which is supported by telecommunications provider Nortel. The technology of content peering has similar logic, supported by Content Alliance [61] that allows for collaboration between different CDNs (request forwarding and redirection from an overlay network to another). Specifically regarding Content Bridge, its aim is to create an end-to-end agreement for the sale and provision of content distribution services. This alliance allows owners of data centers, network operators and Internet service providers to offer content delivery beyond the boundaries of their own networks, giving them a share of revenue in exchange for services provided by their own resources. Moreover, it grants content providers control and transparency regarding the content delivered by local servers.
Figure 7 Content Bridge model The business model promoted by the alliance is based on: owners of data centers, taking advantage of existing commercial relationships established with content providers cooperating network operators for the delivery of content to Internet service providers, who store content on local servers. 3.3.4. Peer To Peer (P2P) model The main difference of the P2P model from previous models is the definition of the edge of the network. In the business models presented so far, the edge of the network where content delivery is performed is either at the perimeter of the backbone network or in the ISP network. The first case is when the CDN is operated by one entity, while in the second case the operator of the CDN is an independent business entity. But with the emergence of the new business model of P2P CDN, the
edge of the CDN is moved even further out to include the computers of users who now share content between them. The idea behind the model is to convert all its nodes in well-behaving servers. End users become content distributors to deliver content from the "user edge" as opposed to the "network edge". This is achieved by installing special software on client machines of end users, allowing them to exchange their content. Content can be sent to users from central servers during periods of low traffic, therefore reducing bandwidth costs and server load. The computers of the first users to request some content, retrieve that content from the central servers of the CDN provider and then act as caches for subsequent users who request the same content. The main source of revenue for content distributors in the business model of P2P CDNs are content providers. The charge is justified by the initial copy of the content on the servers officer of the CDN and the subsequent distribution to end users. The secondary source of revenue are the end users, who either get the software for a fee or completely free of charge, in order to quickly create a large scale distribution network. Additionally, there may be charges for the bandwidth used to transfer content between the end-user nodes and the central servers of the P2P CDN. This fully distributed peer-to-peer approach involves the cost-effective delivery of large files and streaming content to end users. Compared with traditional content distribution, where each new user is an additional burden on the network, P2P networks benefit in performance as the grow larger. Since each user is in essence a new network node, more user participation increases the available storage space for caching, thus improving the performance of the whole network. Moreover, the CDNs that are established in the global market are usually based on the development of a global and expensive to maintain network of tens of thousands of dedicated caching servers. The effectiveness of these networks depends directly on the physical proximity of the proprietary caches to end users. In contrast to the above models, the P2P content distribution network model is primarily based on software. The model of P2P CDN has sometimes application in intranets of large organizations.
Figure 8 P2P CDN model An example of a free P2P CDN is Coral CDN [65] which is comprised of a world-wide network of web proxies and name-servers that run on PlanetLab [66] nodes across the globe. 3.3.5. Pay per view model The business model CDN of pay-per-view requires users to pay for the content they consume. This model may be used in video streaming (sporting events, movies, etc.). This model may have wide application in the future, since the majority of end users have broadband access and low-cost (fixed rate) bandwidth usage. In essence it requires the distribution of popular content that users are willing to pay for. An alternative model that can support the delivery of content in a manner similar to that of conventional TVs is the usage of advertisements. This is an indirect way of charging users, since advertisements are inserted in the content, and may be personalized according to the interests and geographic region of the users. In essence, this model is not so different than the one used in YouTube nowadays. Google uses advertisements to support the maintenance of thousands of servers that are spread through the world, and make profit.
4. The Cloud The basic principle of cloud computing is to support easy network access to a pool of configurable computing resources (such as networks, servers, storage resources, applications, services) that are available with minimal management effort and interaction with the provider of the service. This model fosters availability and scalability. The concept of cloud computing is a new approach in the field of distributed systems that uses technologies that already existed. The Cloud infrastructure currently consists of services offered by data. Some of the main features of the Cloud are to minimize the investment cost of the customer, using better quality software at lower cost and the ability of users to use computer technology regardless of their position or the tools they have available. The intended effect of the development of Cloud Computing is the concentration of computing power in less space with low installation and operating costs. While the Cloud promises many benefits to companies and individuals, involves some serious risks related to data security. 4.1. Cloud computing features The essential features of Cloud Computing are the following: On-demand self-service: The user can unilaterally use computing resources, such as server CPU and storage and network resources automatically, as needed, without requiring interaction with the provider of the service. Broad network access: Cloud resources are available through the network to which one can access through known mechanisms and supports the use of heterogeneous terminal devices on the end user side (such as mobile phones, laptops and PDAs).
Resource pooling: The provider's computing resources are concentrated so that they can serve multiple clients in parallel using the multi-tenant model, with different physical and virtual resources dynamically assigned to customers based on demand. The user has no control or knowledge of the exact location of the resources provided, but may have the ability to define a relatively abstract level of location (country, region or data center). These resources can be storage, computing power, memory, bandwidth, and virtual machines. Rapid Elasticity: These resources can grow very quickly and in a very flexible way, in many cases automatically, so no interaction is needed between the customer and the provider of the service. Measured Service. Cloud systems can automatically monitor and improve the available resources using a measuring system, depending on the type of service offered. Sharing of infrastructure: The Cloud does extensive use of virtual machines. So end users can have greater benefits with fewer resources (hardware) and service providers have better utilization of their resources. 4.2. Cloud Computing Services The services offered by the Cloud are the following: Cloud Software as a Service (SaaS): The customer uses applications installed in the Cloud. The applications are accessed through interfaces or tools such as an Internet Browser. The user is not authorized or needed to manage the resources required by the application (servers, operating systems, storage areas or even specific application configuration settings). One such simple example is the implementation of webmail or Google Docs service. Cloud Platform as a Service (PaaS): The user may develop applications in the Cloud infrastructure using a programming language and tools supported by the provider. The providers of PaaS usually offer a bundling of software and infrastructure in the form of a programmable environment and provide the end user a platform that hosts the user s own applications or services.
Cloud Infrastructure as a Service (IaaS): The user has control of basic computing resources and applications. The user does not have the authority to control the infrastructure of the Cloud, but is authorized to control the operating system, install or develop applications and may also have limited control over some network resources such as firewalls. Figure 9 The stack of the Cloud 4.3. Cloud Deployment Models The deployment models of the Cloud are the following: Private cloud. The infrastructure of cloud belongs exclusively to a single organization. It may be managed by the organization itself or by a third party and is usually located on building infrastructure of the organization. Organizations in the effort to develop the private clouds, implement virtualisation within their own data centers. A private cloud offers the potential to achieve greater security than the Public Cloud but usually is more expensive that the public Cloud.
Community Cloud: The Cloud infrastructure is shared in various organizations and supports pre-defined communities that may have common requirements for security, functionality, etc. The promise of community Clouds is that multiple independent entities gain the cost-benefit of a common non-public Cloud, avoiding the safety and regulatory concerns that may exist in a public Cloud. Public cloud. The infrastructure of the Cloud may be available to the public or a large group of organizations / businesses and belongs to an organization that manages the Cloud services. The most common forms of public cloud are those that are accessible through the Internet. In recent years there has been tremendous growth in the public Cloud, resulting in a large provision of IaaS services from companies such as Amazon with Amazon EC2, IBM BlueCloud and Rackspace Cloud. Well known PaaS are Google AppEngine and the Windows Azure. Hybrid Cloud. In this case, the cloud infrastructure is a composition of one or more Clouds (private, community, OG public) that are separate entities and allow the transfer of both data and applications (eg, cloud bursting for loadbalancing between clouds). Figure 10 Cloud deployment models
4.4. The Cloud and CDN Before examining how the cloud can be used in the CDN business models, we will discuss the role of content delivery chain of content delivery on the Internet. The cache servers are typically installed at the premises of ISPs, because end users are directly connected to ISPs. A content provider, however, may not come in contact with all ISPs to meet the wide range of customers. The role of the content distribution service is to act as middleware between the ISP and the content provider. So in essence it is a mechanism for the collection and subsequent distribution not only of the content but also of the cash flow. The content distributor collects fees from content providers and pays a fee to backbone network operators for data transfer and to ISPs for hosting their machines (if not free of charge in exchange for the use of stored content). Each of the CDN business models analyzed earlier in this model has its pros and cons. The "traditional" content-centric CDN operating model has positive elements, such as peering agreements with ISP for free installation of caches, but also weaknesses. The main one is the major operating expenses for the maintenance of a global infrastructure of caches, and also the lack of control of the service from end to end, as the transfer of content is done through third party networks. For this the companies adopting this model must develop strong alliances and cooperation agreements with suppliers of network services, the effectiveness of which depends to a large extent their own performance. To solve the problem of full control of the content delivery service, some alternative models have been suggested, such as the Content Bridge, which rely on the cooperation of many entities to provide an integrated service. The drawbacks of these models are mainly the complexity and the difficulty of managing partnerships and individual agreements between the parties involved.
On the other hand, in the recent years a new service has emerged that can provide Internet-enabled content storage and delivery capabilities in several continents. The Cloud Storage providers usually offer high quality performance and high availability, guaranteed by Service Level Agreements (SLA). The client of such a service pays a rather low fee for utilizing the Cloud storage, which is typically in the order of cents per gigabyte [72]. Some examples of costs for well known Cloud providers is shown in the next figure. Figure 11 Prices of well known Cloud providers, (year 2008) [61] Large enterprise customers typically utilize pervasive and high performing Content Delivery Networks (CDNs) like Akamai who maintains a global infrastructure of thousands of servers. However, such large scale CDNs are priced out of the reach of most small to medium enterprises (SMEs), government agencies, universities, and charities [72][75]. As a result, the idea of utilizing the Cloud as rather inexpensive CDN is rather promising. Using "pay as you go" pricing, Cloud based CDNs have the ability to satisfy both flash crowd events and also anticipated increases in demand. Therefore, the Cloud can provide a rather inexpensive coupling of web hosting services with content delivery. The services provided by the CDN can be considered as a distributed extension of a centralized hosting service. Since Cloud providers usually possess various data centers across the globe, it is possible to provide content distribution services. Another positive consequence of this coupling for Cloud providers is the lock-in to customers. The greater the range of services supplied by a single provider, the lower the likelihood of switching providers in the future, or to engage in domestic production instead of buying those services.
The coupling of the Cloud with CDN hosting services is a natural threat to the autonomous CDNs. This combination is known as "one-stop-shop" and is quite enticing for content providers, as it is less costly. In the next figure, we present the Cloud CDN business model. Figure 12 CDN cloud model 4.4.1. Examples of Cloud CDN services Economies of scale, in terms of cost effectiveness and performance for both providers and end-users, may be achieved with the utilization of pre-existing Storage Cloud infrastructure, instead of investing large amounts of money in their own content delivery platform or utilizing one of the incumbent operators like Akamai. Such examples of services already available are: CloudFare [78], CloudLayer [79], Cloud Files [80], HP Cloud CDN [81]. CloudFlare [78] provides a Cloud based CDN that currently comprises of 23 data centers around the world. The content delivery overlay network is designed to reduce hops and minimize network latency. On average, a request is fewer than 10 hops and takes less than 30ms. CloudFlare does not bill customers for bandwidth usage. The customer pays a flat rate per plan, which is based on functionality. The
more expensive plans provide Service Level Agreements for high availability, dedicated support and customization. CloudLayer Content Delivery Network (CDN) currently distributes content through a network with 24 nodes throughout the cloud, in an effort to put content geographically closer to end-users of small to middle range enterprises CloudLayer CDN includes robust tools for digital rights management and content monetization. Cloud Files, powered by OpenStack, provides an integrated solution of online storage for files and media and content delivery through Akamai's CDN. Cloud Files uses 213 of Akamai's edge locations in an effort to satisfy demand in a worldwide level, while keeping costs as low as possible. Cloud Files is in essence an alliance between a Cloud provider and a CDN provider in an effort to minimize costs while achieving low network latency. The drawback of this solution, compared to CloudFare and CloudLayer is that customers are charged for both storage and outgoing bandwidth. HP Cloud CDN is another integrated solution of online storage for files and media and content delivery. It provides static data delivery from HP Cloud Object Storage to users around the world by caching user data across the HP and Akamai global networks. As in the case of Cloud Files, customers are charged for both storage and outgoing bandwidth. It is obvious that solutions such as CloudFlare and CloudLayer provide a "poor man's CDN" solution, since they lack major operating expenses for the maintenance of a global infrastructure of caches, but do not provide control of the service from end to end, as the transfer of content is done through third party networks. On the other hand, solutions such as HP Cloud CDN and Cloud Files provides an integrated solution of online storage for files and media and content delivery through Akamai's CDN. They are more expensive solutions, but provide control of the service from end to end.
5. Conclusions Caching services provide the basic operation of copying popular content from a source server to a local server, thereby reducing the distance between the user and the requested content and thus accelerating the process of delivery. Moreover, caching reduces bandwidth consumption, since the requested content is delivered locally. On the other hand, content delivery networks are distributed systems of interconnected servers that form an overlay network. In addition, content delivery systems add another layer of functionality, offering advanced services such as streaming media distribution, data tracking, content personalization and localization, content synchronization with the source server, load balancing, etc. Caching services are typically passive, that is a caching server downloads updated content from the source server upon user request, or when it has been modified since the last save. Alternatively, active caching systems make assumptions about the life expectancy of content according to frequency of changes or other factors. On their part, content delivery systems proactively forward content to edge servers at regular predesigned intervals, thus ensuring the validity of the content. Usually, the content owner provides the content distributor with a list of updated objects to be copied and forwarded to the edge servers. Caching servers are typically installed at the premises of ISPs, because end users are directly connected to ISPs. A content provider, however, may not come in contact with all ISPs to meet the wide range of customers. The role of the content distribution service is to act as middleware between the ISP and the content provider. So in essence it is a mechanism for the collection and subsequent distribution not only of the content but also of the cash flow. The content distributor collects fees from content providers and pays a fee to backbone network operators for data transfer and to ISPs for hosting their machines (if not free of charge in exchange for the use of stored content). Each of the CDN business models analyzed earlier in this model has its pros and cons. The "traditional" content-centric CDN operating model has positive elements, such as peering agreements with ISP for free installation of caches, but also weaknesses. The main one is the major operating expenses for the maintenance of a
global infrastructure of caches, and also the lack of control of the service from end to end, as the transfer of content is done through third party networks. For this the companies adopting this model must develop strong alliances and cooperation agreements with suppliers of network services, the effectiveness of which depends to a large extent their own performance. To solve the problem of full control of the content delivery service, some alternative models have been suggested, such as the Content Bridge, which rely on the cooperation of many entities to provide an integrated service. The drawbacks of these models are mainly the complexity and the difficulty of managing partnerships and individual agreements between the parties involved. Since CDNs charge high fees for their high quality services, they are targeted towards large organizations-content providers who need to efficiently distribute content globally. Independent individuals or small organizations that cannot meet the financial requirements for using the services of a CDN are still forced to rely on traditional caching to efficiently distribute their content. For these reasons, the Cloud content distribution model appears as an ambitious and promising answer to the problems of content delivery. The Cloud can provide a rather inexpensive coupling of web hosting services with content delivery. The services provided by the CDN can be considered as a distributed extension of a centralized hosting service. Since Cloud providers usually possess various data centers across the globe, it is possible to provide content distribution services. The Cloud may be a "poor man's CDN" solution, since it lacks major operating expenses for the maintenance of a global infrastructure of caches, but does not provide control of the service from end to end, as the transfer of content is done through third party networks. On the other hand, hybrid solutions that provide integration of the Cloud for online storage and content delivery through traditional CDNs are targeted to medium enterprises. They are more expensive solutions, but provide control of the service from end to end.
References [1] John Dilley, Bruce Maggs et. Al. Globally Distributed Content Delivery. IEEE Internet Computing, vol. September-October 2002, pp 50-58. [2] Jorge Escorcia, Dipak Ghosal, and Dilip Sarkar. A Novel Cache Distribution Heuristic Algorithm for a Mesh of Caches and Its Performance Evaluation. [3] D. Wessels, K. Claffy. Internet Cache Protocol (ICP), version 2. RFC 2186. National Laboratory for Applied Network Research / UCSD, September 1997. [4] I. Cooper, I. Melve, G. Tomlinson. Internet Web Replication and Caching Taxonomy. RFC3040. Network Working Group, January 2001. [5] Pei Cao, Sandy Irani. Cost-Aware WWW Proxy Caching Algorithms. Proceedings of the USENIX Symposium on Internet Technologies and Systems Monterey, California, December 1997. [6] Williams, Stephen, et al. Removal Policies in Network Caches for World-Wide Web Documents. In Proceedings of ACM SIGCOMM '96 1996 pp.293-305. [7] Kopf David. Will Content Delivery Networks Lead Enterprises Down A Rabbit Hole?. Business Communications Review, vol. October 2000 pp. 16-19 [8] Cisco Systems. Cisco Web Network Services for Content Distribution and Delivery. White Paper. 2000 [9] Verma D., Calo S., Amiri K. Policy-Based Management of Content Distribution Networks. IEEE Network, Vol. March/April 2002, pp. 34-39 [10] Karbhari P., Rabinovich M., Xiao Z., Douglis F. ACDN: a Content Delivery Network for Applications. AT&T Labs Research. ACM SIGMOD 2002 June 4-6, Madison, WI, USA. [11] Stardust.com Inc. Content Networking and Edge Services: Leveraging the Internet for Profit. White Paper. September 2001. [URL: http://www.stardust.com] [12] Edge Side Includes (ESI) Overview [URL: http://www.edgedelivery.com/esi_overview.pdf] [13] MacVittie Lori. Emerging ESI- Lower Costs, Better Performance. Network Computing, vol. 1/7/2002, pp 62-64 [14] ESI Language Specification 1.0, W3C Note 04 August 2001
[URL: http://www.w3.org/tr/esi-lang ] [15] Krishnamurthy B., Wills C, Zhang Y. On The Use And Performance Of Content Distribution Networks. ACM SIGCOMM Internet Measurement Workshop 2001. [16] J. Kangasharju. Distribution de l'information sur Internet (Internet Content Distribution). Thèse De Doctorat, L'université De Nice, Sophia Antipolis, 2002. [17] J. Kangasharju, J. Roberts, K. W. Ross. Locating Copies of Objects Using the Domain Name System. In Proc. 4th Web Caching Workshop, San Diego, CA, March 1999. [18] J. Kangasharju, J. Roberts, K. W. Ross, Performance Evaluation Of Redirection Schemes. In Content Distribution Networks, 5th International Web Caching and Content Delivery Workshop. [19] J. Kangasharju, J. Roberts, K. W. Ross. Object Replication Strategies in Content Distribution Networks. In Proceedings of WCW'01, Web Caching and Content Distribution Workshop, Boston, MA, June 2001. [20] Magnus Karlsson et al. A Framework for Evaluating Replica Placement Algorithms. Internet Systems and Storage Laboratory, HP Laboratories Palo Alto. HPL-2002-219. August 2002. [21] Magnus Karlsson, Mallik Mahalingam. Do We Need Replica Placement Algorithms in Content Delivery Networks?. In Proc. Of the International Workshop on Web Content Caching and Distribution (WCW), August, 2002. [22] A. Bar-Noy, R. Bar-Yehuda, A. Freund, J. Naor, B. Schieber. A Unified Approach to Approximating Resource Allocation and Scheduling. [23] R. Cohen, L. Katzir, D. Raz. Scheduling Algorithms for a Cache Pre-Filling Content Distribution Network. IEEE INFOCOM 2002 [24] Rappa Michael. Business Models On The Web - Managing the digital enterprise [URL: http://digitalenterprise.org/models/models.html] [25] Wetzel Rebecca. CDN Business Models - Not All Cast from the Same Mold. Business Communications Review vol. April 2001. [URL: http://www.bcr.com] [26] Wetzel Rebecca. CDN Business Models - The Drama Continues. Business Communications Review vol. April 2002. [27] HTRC Group, LLC. Report. The Commercial-Grade Internet: Edge Systems and Services. January 2002 [28] Timmers Paul. Business Models for Electronic Markets. European Commission, Directorate-General III, April 1998
[29] John Chung-I Chuang. Economies of Scale in Information Dissemination over the Internet. Dissertation, Carnegie Mellon University, Pittsburgh, Pennsylvania, November 1998. [30] Terence P. Kelly, Yee Man Chan, Sugih Jumin, Jeffrey MacKie-Mason. Biased Replacement Policies for Web Caches: Differential Quality-of-Service and Aggregate User Value. In Fourth International Web Caching Workshop, San Diego, California, April 1999. [31] Terence P. Kelly, Sugih Jumin, Jeffrey MacKie-Mason. Variable QoS from Shared Web Caches: User-Centered Design and Value-Sensitive Replacement. In MIT Workshop on Internet Service Quality Economics, Cambridge, MA, USA, December 1999. [32] Hal Varian and Jeffrey K. MacKie-Mason. Generalized vickrey auctions. Technical report, Dept. of Economics, University of Michigan, July 1994. [33] Yee Man Chan, Jeffrey K. MacKie-Mason, Jonathan Womer, and Sugih Jamin. One size doesn t fit all: Improving network QoS through preference-driven Web caching. In Second Berlin Internet Economics Workshop, May 1999. [34] Yee Man Chan, Jonathan Womer, Sugih Jamin, Jeffrey K. MacKie-Mason. The Case for Market-based Push Caching. [35] Kartik Hosanagar, Ramayya Krishnan, John Chuang, Vidyanand Choudhary. Pricing and Resource Allocation in CachiServices with Multiple Levels of QoS. Working Draft. [36] R. T. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. RFC 2616: Hypertext Transfer Protocol _ HTTP/1.1, June 1999. [37] Brian D.Davison. A Web Caching Primer. IEEE Internet Computing, July August 2001, p.p 38-45 [38] G. Barish and K. Obraczka. World Wide Web Caching: Trends and Techniques IEEE Comm. Internet Technology Series, vol.38,no. 5,May 2000, pp. 178 184. [39] J. Wang. A Survey of Web Caching Schemes for the Internet, ACM Computer Comm. Rev., vol. 29, no. 5, Oct. 1999, pp. 36-46. [40] Caching Tutorial for Web Authors and Webmasters URL : [http://www.webcaching.com/mnot_tutorial/intro.html] [41] Akamai Inc. [URL]: [http://www.akamai.com] [42] C. Courcoubetis and R. Weber. Pricing Communication Networks. To be print. [43] M3I (Market Managed Multi-service Internet). Deliverable 7.1: ISP Business Model Report.2002.
[44] Akamai Inc. [URL]: [http://www.akamai.com] [45] Speedera Inc. [URL]: [http://www.speedera.com] [46] R. Wooster and M. Abrams. Proxy Caching the Estimates Page Load Delays. In the 6th International World Wide Web Conference, April 7-11, 1997, Santa Clara, CA. [47] P. Lorenzetti, L. Rizzo and L. Vicisano. Replacement Policies for a Proxy Cache. [48] T. H Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990. [49] Joan Feigenbaum, Scott Shenker. Distributed Algorithmic Mechanism Design: Recent Results and Future Directions. Dial-M 02, September 28, 2002, Atlanta, Georgia, USA. [50] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An architecture for differentiated services. In IETF RFC 2475, December 1998 [51] Peter Reichl, Burkhard Stiller, Simon Leinen. Pricing Models for Internet Services. CATI (Charging and Accounting Technology for the Internet) Deliverable. March 1999. [52] M3I (Market Managed Multi-service Internet). Deliverable 16. Internet Economics Part I: Current Trends. 2002. [53] MacKie-Mason, J.: Varian, H.: Pricing the Internet.1994. [54] Lazar, A.; Semret, N.: Auctions for Network Resource Sharing. CTR Tech. Report. Columbia University New York, Febr. 1997. [55] Ying Lu, Saxena A., Abdelzaher F. Differentiated caching Services; A Control- Theoretical Approach [56] I. Cidon, S. Kutten, and R. Soffer. Optimal allocation of electronic content. In Proceedings of IEE Infocom, Anchorage, AK, April 22-26, 2001. [57] B. Li, M. J. Golin, G. F. Italiano, X. Deng, and K. Sohraby. On the optimal placement of web proxies in the internet. In Proceedings of the IEEE Infocom 1999, pages 1282-1290, NY, USA, March 1999. [58] P. Krishnan, D. Raz, and Y. Shavitt. The cache location problem. IEEE/ACM Transactions on Networking, 8(5):568-582, October 2000. [59] L. Qiu, V. N. Padmanabhan, and G. M. Voelker. On the placement of web server replicas. In Proceedings of IEEE Infocom, Anchorage, AK, April 22-26, 2001. [60] Content Bridge. http://www.content-bridge.com. [61] Content Alliance. http://www.content-peering.org.
[62] Abley, J.; Lindqvist, K. (December 2006). "Operation of Anycast Services". RFC 4786 [63] http://en.wikipedia.org/wiki/anycast [64] Quercia, Daniele; Lathia, Neal; Calabrese, Francesco; Di Lorenzo, Giusy; Crowcroft, Jon (2010). "Recommending Social Events from Mobile Phone Location Data". 2010 IEEE International Conference on Data Mining. pp. 971. [65] Foundations of Location Based Services", Stefan Steiniger, Moritz Neun and Alistair Edwardes, University of Zurich [66] Brian Hayes, Cloud Computing, Communications of the ACM, Vol. 51, No.7, July 2008 [67] CHANG, F., DEAN, J., GHEMAWAT, S., HSIEH, W., WALLACH, D., BURROWS, M., CHANDRA, T., FIKES, A., AND GRUBER, R. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 06) (2006). [68] Chris Preimesberger, Get off my Cloud: Private Cloud Computing Takes Shape, 4 November 2008*online+. Available from : http://www.eweek.com/c/a/cloud-computing/why-private-cloud-computing-is- Beginning-to-Get-Traction/ [69] Christina Hoffa, Gaurang Mehta, On the Use of Cloud Computing for Scientific Workflows, Indiana University, University of Southern California, Argonne National Laboratory, Caltechs [70] Torry Harris, Cloud Computing Services- A comparison, http://www.thbs.com/pdfs/comparison%20of%20cloud%20computing%20servic es.pdf [71] L. M. Vaquero, L. Rodero Merino, J. Caceres, M. Lindner, A Break in the clouds: Towards a cloud Definition [72] James Broberga, Rajkumar Buyyaa, Zahir Tarib, MetaCDN: Harnessing Storage Clouds for high performance content delivery, Journal of Network and Computer Applications, Volume 32, Issue 5, September 2009, Pages 1012 1022 [73] Broberg, J. (2011) Building Content Delivery Networks Using Clouds, in Cloud Computing: Principles and Paradigms (eds R. Buyya, J. Broberg and A. Goscinski), John Wiley & Sons, Inc., Hoboken, NJ, USA. [74] Chrysa Papagianni, Aris Leivadeas, Symeon Papavassiliou, "A Cloud-Oriented Content Delivery Network Paradigm: Modeling and Assessment," IEEE Transactions on Dependable and Secure Computing, vol. 99, no. PrePrints, p. 1, 2013
[75] D. Rayburn. Cdn pricing: Costs for outsourced video delivery. In Streaming Media West: The Business and Technology of Online Video, September 2008. Available at http://www.streamingmedia.com/west/presentations/smwest2008-cdn- Pricing.ppt. [76] M. Freedman, E. Freudenthal, et al. Democratizing Content Publication with Coral. In ACM Symposium on Networked Systems Design and Implementation. ACM Press, 2004. [77] M. Fiuczynski. PlanetLab: overview, history, and future directions. ACM SIGOPS Operating Systems Review, 40(1):6 10, 2006. [78] CloudFare: http://www.cloudflare.com/features-cdn [79] CloudLayer: http://www.softlayer.com/cloudlayer/cdn/ [80] Rackspace Cloud Files: http://www.rackspace.com/cloud/files/ [81] HP Cloud CDN: https://www.hpcloud.com/products/cdn