Journal of Algorithms & Computational Technology Vol. 6 No. 3 483 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study N. Ramachandran * and P. Sivaprakasam + *Indian Institute of Management Kozhikode + Sri Vasavi College, Erode Email: psperode@yahoo.com Received: 01/03/2011; Accepted: 26/04/2012 ABSTRACT This paper examines the website performance analysis of the Content Delivery Network (CDN) against the normal webhosting method, by using the existing resources available in an academic institute. It also compares the collected real-time data with CDN simulation software. In this experiment, URL rewriting is the method adopted for the CDN and reverse proxy. It aims to improve the response time of access to the web site by overcoming the flash crowd and provide the content of the web site in the most efficient manner. The significance of this method is that it allows for using the existing and available local resources and it calls for no additional investment on infrastructure from the part of the organization. A detailed real-time account of both the methods and analyses, towards finding out the best suitable method is illustrated. 1. INTRODUCTION The web content providers sharing the content over the Internet during the past did not bother about the users, especially in terms of response time, as there were only very few users. However, over the years, there has been drastic increase in the number of users accessing the web content over the network. The web content providers are unable to deliver the quality of service due to variety of reasons, say, bandwidth, server load, response time, flash crowd, denial of service, system architecture etc. In order to address the above issues, different technologies evolved and emerged from time to time, such as the web caching and Ajax for quick response, Captcha for reducing flash crowd, etc. However, these technologies are not capable of addressing all the complex * Corresponding author. raman@iimk.ac.in
484 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study issues relating to content sharing. The Content Delivery Network (CDN) [1] is only capable of resolving all the above issues. Researchers have widely considered CDNs to be an effective solution [2] to reduce flash crowd, denial of service, server load, bandwidth, system architecture, etc., apart from providing an efficient website access with good response time. Outsourcing to CDN providers being a costly solution, it is mostly beyond the reach of most of the small, medium size organizations and academic institutes. At the same time, most of the organizations and academic institutions have many IT resources lying underused and underutilized. By using CDN techniques in-house, we can effectively utilize the underused resources in academic institutes, small and medium organizations without any financial implication. CDN uses either the DNS redirection or URL rewriting technique. In this context, we have implemented the CDN technique URL rewriting and reverse proxy to address the above issues by looking at the possibilities of improved, dynamic, intelligent and secured content provision at significant speed and response time. It further compares the CDN performance with realtime data which is generated by setting up a real-time simulation, against the simulation data, which is generated by the simulation software CDNSim. Although a very limited literature is available in this area, a detailed review of the same revealed that such an attempt has not been made yet. 2. METHODOLOGY This section explains about the existing IT setup of an academic institute, simulation software which was used for this experiment and the implementation of CDN in an academic institute using the existing resources for real time experiment. 2.1. Overview of Academic IT Setup Every academic institute has at least one computer laboratory comprising 50 desktops, with minimum configurations of Pentium IV, 512 MB RAM and at least 40 GB Hard disk. The newly setup labs have much more capacity than the above, though they do not really need such a high-end configuration. It is not just because of the dire need or immediate utility, but it is because of the mere availability of such configurations in the market. As per the data collected from various academic institutes in India, the utilization of the systems is about 40% only. On the other hand, most of their services like website hosting, mail server, DNS server, etc., are outsourced justifying that there is lack of infrastructure.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 485 2.2. Overview of Simulation Software - CDNSim CDNSim is free open source simulation software for modeling and simulation framework for CDNs [3]. Main features of this software, as per the software documentation, are: 1. Cooperative push based content management policy; 2. Non-Cooperative push based content management policy; 3. Cooperative pull based content management policy; 4. Non-Cooperative pull based content management policy; 5. LRU cache replacement policy; 6. STATIC cache policy; 7. TCP / IP networking; 8. Wizard for creating self-contained simulations (bottles); 9. Utility for executing unattended simulations; 10. Utility for automatically generating results reports; 11. Utility for extracting statistics related to net-utility; 12. Utility for converting Apache log files into CDNSim trace files; 13. Extensible by implementing modules in the form of libraries. This software is widely used by well renowned institutions around the world. However, CDNSim has some limitations too. It is exclusively designed for CDN setup. It receives only inputs like number of origin server, number of surrogate servers, number of clients, bandwidth, etc. Maximum end user customization cannot be achieved. 2.3. CDN Configuration in an Academic Institute In this setup, the server which acts as CDN server was placed in the demilitarized zone, though it is not mandatory. The CDN nodes were placed in the militarized zone. If the militarized and demilitarized zone is not implemented in the institute then both CDN server and CDN nodes were placed within the LAN. The CDN technique URL rewriting page and reverse proxy were configured in CDN server. The web contents were placed in all the CDN nodes. Figure 1 shows the CDN implementation in an academic institute. Only constraint here is that the default web server Internet Information Services (IIS) of Windows desktop operating system cannot be used as the web server in the nodes, due to limited connectivity. The default web server IIS of Windows desktop operating system supports only 10 concurrent sessions. In order to overcome this, any other web server, say apache, which supports more concurrent sessions, is to be deployed in the place of IIS.
486 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study Demilitarized zone Port 1 Firewall Port 3 Client Militarized zone Port 2 Reverse proxy & URL rewriting Stack switch Client Web content Client Web content Client Web content Client Web content Client Web content Figure 1. CDN setup. 3. EXPERIMENT In this experiment, three different simulations were implemented. They are: 1. Normal Web hosting Vs CDN Web hosting; 2. CDN setup using simulation Software CDNSim; 3. Real-time CDN Simulation. 3.1. Normal Web Hosting Vs CDN Web Hosting: In this simulation, eight different web hosting methods were examined to analyze the performance of http port response time, using scripts. They are: 1. Normal web hosting using Server. 2. Normal web hosting using Desktop. 3. Server act as CDN and website hosted in 2 nodes. 4. Node act as CDN and website hosted in 2 nodes. 5. Server act as CDN and website hosted in 5 nodes. 6. Node act as CDN and website hosted in 5 nodes. 7. Server act as CDN and website hosted in 10 nodes. 8. Node act as CDN and website hosted in 10 nodes. Each method was tested 3 times.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 487 In the first method, the website was hosted in a server class machine. The server s configuration is dual xeon 2.3 GHz processor with 1 GB DDR2 with ECC RAM. Figure 2 shows the sample data which was tested with 4742 hits per minute and the average response time taken was approximately 0.1 milliseconds. In the second method, the website was hosted in one of the nodes. The node s configuration is Pentium IV processor with 256 MB DDR1 RAM. The sample data was tested with 4745 hits per minute and the average response time taken was approximately 0.17 milliseconds. In the third method, URL rewriting page and the reverse proxy were configured in the server class machine and the website was hosted in 2 nodes. The Node s configuration is Pentium IV 2.8 GHz with 1 GB DDR2 RAM & Dual core 2Duo 2.93 GHz with 2 GB DDR2 RAM. Figure 3 shows the sample data tested with 2545 hits per minute and the average response time taken is 0.52 milliseconds approximately. In the fourth method, URL rewriting script page and reverse proxy were configured in the node and the website was hosted in 2 nodes. The sample data was tested with 3671 hits per minute and the average response time taken is 0.57 milliseconds approximately. Figure 2. Sample data, web hosting in server.
488 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study Figure 3. Sample data, web hosting in CDN with 2 nodes. In the fifth method, URL rewriting script page and reverse proxy were configured in the server and the website was hosted in 5 nodes. Figure 4 shows the sample data tested with 2529 hits per minute and the average response time taken is 0.43 milliseconds approximately. In the sixth method, URL rewriting script page and the reverse proxy were configured in desktop and the website was hosted in 5 nodes. The sample data was tested with 4682 hits per minute and the average response time taken is 0.43 milliseconds approximately. In the seventh method, URL rewriting script page and the reverse proxy were configured in server and the website was hosted in 10 nodes. Figure 5 shows the sample data tested with 5146 hits per minute and the average response time taken is 0.39 milliseconds approximately. In the eighth method, URL rewriting script page and the reverse proxy were configured in desktop and the website was hosted in 10 nodes. The sample data was tested with 5146 hits per minute and the average response time taken is 0.44 milliseconds approximately.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 489 Figure 4. Sample data, web hosting in CDN with 5 nodes. 3.2. CDN Setup Using simulation Software CDNSim In this simulation, the inputs were given in 5 different steps. Figure 6 represents the first step, in which the cooperative environment (closest surrogate) was selected. Subsequently in the second step the inputs regarding routers, link speed, number of outgoing connections, number of clients, number of retries, mean waiting per retry, number of incoming and outgoing connections of both surrogate servers and origin servers, number of surrogate servers and origin servers were provided. Input values are illustrated in figure 7. Figure 8 represents step 3 and in this step the website object ID, object size and traffic in a text format are given as input. Figure 9 represents step 4 and this step configures each surrogate server s local cache. The configuration is set by a file, which describes the contents, the capacity and the cache replacement policy of every surrogate server. The file contains records, each one referring to a surrogate server.
490 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study Figure 5. Sample data, web hosting in CDN with 10 nodes. CDNsim - Bottle wizard - CDNpolicy CDN policy Cooperative environment (closest surrogate) Non - cooperative environment (closest origin) Cooperative environment (random surrogate) Cooperative environment (surrogate load balance) Next > Figure 6. CDNSim policy setup. Figure 10 represents the final step in which the output directory is set and the bottle name in which the collected data will be stored. This simulation setup repeated three times using 2, 5 and 10 surrogate servers.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 491 CDNsim - Bottle wizard - network topology 2/5 Routers /home/cdnsim/router open Links speed in Mbits/sec 100 Outgoing connections 1000 Outgoing connections 1000 Surrogate servers 10 Incoming connections 1000 Clients 100 Number of retries 10 Mean waiting time per retry Origin servers 1 Incoming connections 1000 5 < Back Next > Figure 7. CDNSim network topology. CDNsim - Bottle wizard - Dataset 3/5 Website /home/cdnsim/website/access_log open Traffic /home/cdnsim/traffic/trace_file50000.text open < Back Next > Figure 8. CDNSim dataset. Placement /home/cdnsim/placement/pushbased_lru_s Open Shrink caches' capacity to fit exactly the objects in placements file < Back Next > Figure 9. CDNSim cache management.
492 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study Output directory /home/cdnsim/output set dir New bottle's name CDN10 >>KREATE BOTTLE<< < Back Go to begining>>> Figure 10. CDNSim create bottle. 3.3. Real-time CDN Simulation Three different web hosting methods were examined to analyze the performance of the website. They are: 1. Node acting as CDN and website hosted in 10 nodes; 2. Node acting as CDN and website hosted in 5 nodes; 3. Node acting as CDN and website hosted in 2 nodes. Nodes configuration that we used in this experiment is Pentium IV processor with 2 GB RAM. Web page access time data was collected by using the web tool websitepulse.com while the hit was around 2500 per minute. The hits were simulated by using script. Figure 11 shows the sample data. Web page test results URL tested: http://cat2009.iimk.ac.in Test performed from: Seattle, WA Test performed at: 2009-11-18 02:13:32 (GMT -08:00) 1 URL 2 0.0 0.6 1.2 1.8 Time (seconds) 2.4 3.0 DNS Connect Redirect First byte Last byte Error # URL Status Time DNS (sec) Connect (sec) Redirect (sec) First (sec) Last (sec) Total (sec) Size (kb) 1 http://cat2009.iimk.ac.in OK 02:13:32 0.0010 0.2887 0.0000 0.2962 0.0001 0.5860 0.11 2 cat2009.iimk.ac.in/cdn101. OK 02:13:32 0.0000 0.2885 1.1008 0.3045 0.3180 2.0118 9.41 Total - - 0.0010 0.5772 1.1008 0.6007 0.3181 2.5978 9.52 Figure 11. Web page test result (sample).
Journal of Algorithms & Computational Technology Vol. 6 No. 3 493 4. DATA ANALYSIS All the data obtained from three simulations are analyzed in this section. 4.1. Normal Web Hosting Vs CDN Web Hosting The results showed that the first method served the pages with very less response time 0.09 milliseconds. The second method was also showing relatively good response time 0.17 milliseconds. This being tested in a desktop, it will not sustain the load, nor will it long last and be reliable. Third and fourth methods were tested with 2 nodes and it took more response time of 0.52 milliseconds and 0.57 milliseconds respectively. This may be due to few numbers of nodes. Fifth and sixth methods were tested with 5 nodes and the response time was 0.43 milliseconds for both. In the last two methods, the test was performed with 10 nodes and the response time was 0.39 milliseconds and 0.43 milliseconds respectively. Figure 12 shows the detailed analysis of all the methods. The analysis clearly shows that if CDN technology is used, then the class of machine, as to whether it is a server or a node, does not make any difference. That is, in all the 3 CDN methods were implemented and tested on both server and node, the response Figure 12. Data analysis of normal web hosting Vs CDN web hosting.
494 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study Response time in milliseconds 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.09 0.17 Comparision chart 0.52 0.57 0.43 0.43 0.44 0.39 First method Second method Third method Fourth method Fifth Method Sixth method Seventh method Eighth method Figure 13. Performance analysis of normal web hosting Vs CDN web hosting. time was almost the same. Secondly, if fewer nodes were used, then it would have taken little longer response time. Figure 13 gives a graphical illustration of the relative performances of the deployed experiments. 4.2. Simulation Software - CDNSim Figure 14 shows CDNSim data report. As the chart shows, in the first method, which involved 2 surrogate servers, the mean response time is 4.33 seconds. In the second and third method, that is, 5 and 10 surrogate severs, the response time is 2.44 seconds each. 4.3. Real-time CDN Figure 15 shows real-time CDN data report. It can be found that, in the first method, which involved 2 surrogate servers, the mean response time is 4.44 seconds. In the second and third method, that is, 5 and 10 surrogate severs, the response time is 2.76 and 2.73 seconds respectively. 4.4. CDNSim Vs Real-time CDN Figure 16 shows performance analysis chart of both CDNSim simulation software and Real-Time CDN data analysis. If we compare both, the difference is approximately 0.3 seconds higher than the CDNSim software, which is insignificant.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 495 2.44 CDNSim data analysis Response time in seconds 4.33 CDN 2 CDN 5 CDN 10 2.44 Figure 14. Data analysis of CDNSim. Real time data analysis Response time in seconds 2.73 2.76 4.44 CDN 2 CDN 5 CDN 10 Figure 15. Data analysis of real-time. CDNSim Vs real time Response time in seconds 5 4 3 2 1 4.33 4.44 2.44 2.76 CDNSim Real time 2.44 2.73 0 CDN 2 CDN 5 CDN 10 Surrogate servers Figure 16. Performance analysis of CDNSim Vs real-time.
496 Real-Time Analysis of CDN in an Academic Institute: A Simulation Study 5. CONCLUSION CDN is a proven technology for avoiding flash crowd with good response time. This study has tested the CDN technologies for efficient web site access in a small academic setting. It has showed promising results on the response time of access to the web site by overcoming the flash crowd and provides the content of the web site in the most efficient manner by deploying and utilizing the existing resources using affordable and appropriate technologies, which are relevant to practical Indian situations. Especially, in academic institutions and small organizations, where many a nodes are underutilized, could be put to optimum use by making use of this technique without disturbing the existing network. If the response time of stand alone server against CDN server and CDNSim software against real-time CDN were compared, the difference is hardly 0.3 seconds, which is insignificant. In stand alone server method, in the event of a single point of failure, as no redundant server class machine were available, it will result in server down-time. It may also lead to data loss due to hard disk crash. If CDN technology with minimum of 4 or 5 nodes, by replicating the data in all nodes were deployed, chances of data loss will not be there. Further, even if one or more of the nodes are down, it will not have any effect on the site access. Therefore it can be considered as a very institution friendly solution as it ensures efficient content delivery over the network, capitalizing on the available resources for optimum utilization. However, it has been observed that in most of the organizations and academic institutes, the nodes are switched OFF after office hours. But the nodes which are used for the CDN setup should be ON round the clock for uninterrupted web service. In this scenario, it will consume more power and also may result in hardware depreciation. However if we compare this with outsourcing cost, it will be cheaper. 6. FUTURE SCOPE As highlighted in the software overview above, CDNSim has certain inherent shortcomings and this could be overcome by analyzing the same setup using different simulation software like NS2, Opnet, etc. Also, this study has considered only textual data as the web site content, whereas future scope lies in considering multimedia content such as audio, video, animations, graphics etc. and their relative content access performance.
Journal of Algorithms & Computational Technology Vol. 6 No. 3 497 REFERENCES [1] A. Vakali and G. Pallis, Content Delivery Networks: Status and Trends, IEEE Internet Computing, November 2003, pp. 68 74. [2] B. Krishnamoorthy, C. Wills, and Y. Zhang, On the Use and Performance of Content Distribution Networks, Proc. 1 st Int l Internet Measurement Workshop, ACM Press, 2001, pp. 169 182. [3] http://oswinds.csd.auth.gr/~cdnsim/ [4] Al-Mukaddim Khan Pathan and Rajkumar Buyya, A Taxonomy and Survey of Content Delivery Networks, http://www.gridbus.org/reports/cdn-taxonomy.pdf [5] Y. Jung, B. Krishnamurthy, and M. Rabinovich, Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites, Proc. 11 th Int l World Wide Web Conf. (WWW 02), ACM Press, 2002, pp. 293 304. [6] J. Coppens et al., Design and performance of a self organizing adaptive content distribution network, IEEE/IFIP Network Operations Management Symposium 2006, Vancouver, Canada, April 2006, pp. 534 545.