A Cluster Based Replication Architecture for Load Balancing in Peer-to-Peer Content Distribution



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

An Interest-Oriented Network Evolution Mechanism for Online Communities

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Fault tolerance in cloud technologies presented as a service

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

M3S MULTIMEDIA MOBILITY MANAGEMENT AND LOAD BALANCING IN WIRELESS BROADCAST NETWORKS

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

A Secure Password-Authenticated Key Agreement Using Smart Cards

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

A DATA MINING APPLICATION IN A STUDENT DATABASE

Load Balancing By Max-Min Algorithm in Private Cloud Environment

Multi-Source Video Multicast in Peer-to-Peer Networks

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

P2P/ Grid-based Overlay Architecture to Support VoIP Services in Large Scale IP Networks

QOS DISTRIBUTION MONITORING FOR PERFORMANCE MANAGEMENT IN MULTIMEDIA NETWORKS

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT

Cooperative Load Balancing in IEEE Networks with Cell Breathing

Mathematical Framework for A Novel Database Replication Algorithm


A Novel Adaptive Load Balancing Routing Algorithm in Ad hoc Networks

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Self-Organized, Fault-Tolerant and Scalable Replication Scheme for Cloud Storage

Enabling P2P One-view Multi-party Video Conferencing

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

How To Plan A Network Wide Load Balancing Route For A Network Wde Network (Network)

An agent architecture for network support of distributed simulation systems

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

A heuristic task deployment approach for load balancing

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Traffic State Estimation in the Traffic Management Center of Berlin

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

Reinforcement Learning for Quality of Service in Mobile Ad Hoc Network (MANET)

IMPACT ANALYSIS OF A CELLULAR PHONE

Peer-to-Peer Networks Protocols, Cooperation and Competition

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

An Ad Hoc Network Load Balancing Energy- Efficient Multipath Routing Protocol

A Passive Network Measurement-based Traffic Control Algorithm in Gateway of. P2P Systems

Network Security Situation Evaluation Method for Distributed Denial of Service

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

J. Parallel Distrib. Comput.

Dynamic Fleet Management for Cybercars

A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks

Case Study: Load Balancing

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Hosting Virtual Machines on Distributed Datacenters

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

Project Networks With Mixed-Time Constraints

Calculating the high frequency transmission line parameters of power cables

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Calculation of Sampling Weights

An Alternative Way to Measure Private Equity Performance

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

HP Mission-Critical Services

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

A New Quality of Service Metric for Hard/Soft Real-Time Applications

An Intelligent Policy System for Channel Allocation of Information Appliance

RequIn, a tool for fast web traffic inference

DEFINING %COMPLETE IN MICROSOFT PROJECT

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Forecasting the Direction and Strength of Stock Market Movement

Complex Service Provisioning in Collaborative Cloud Markets

Multiple-Period Attribution: Residuals and Compounding

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

Design and Development of a Security Evaluation Platform Based on International Standards

SMART: Scalable, Bandwidth-Aware Monitoring of Continuous Aggregation Queries

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Mining Multiple Large Data Sources

Single and multiple stage classifiers implementing logistic discrimination

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

The Load Balancing of Database Allocation in the Cloud

CLoud computing technologies have enabled rapid

A Novel Problem-solving Metric for Future Internet Routing Based on Virtualization and Cloud-computing

Modeling Peer-Peer File Sharing Systems

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

A GENERIC HANDOVER DECISION MANAGEMENT FRAMEWORK FOR NEXT GENERATION NETWORKS

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Transcription:

A Cluster Based Replcaton Archtecture for Load Balancng n Peer-to-Peer Content Dstrbuton S.Ayyasamy 1 and S.N. Svanandam 2 1 Asst. Professor, Department of Informaton Technology, Tamlnadu College of Engneerng Combatore-641 659, Taml Nadu, INDIA. Emal: ayyasamyphd@gmal.com 2 Professor and Head, Department of Computer Scence and Engneerng, PSG College of Technology, Peelamedu, Combatore-641 004, Taml Nadu, INDIA. Abstract In P2P systems, large volumes of data are declustered naturally across a large number of peers. But t s very dffcult to control the ntal data dstrbuton because every user has the freedom to share any data wth other users. The system scalablty can be mproved by dstrbutng the load across multple servers whch s proposed by replcaton. The large scale content dstrbuton systems were mproved broadly usng the replcaton technques. The demanded contents can be brought closer to the clents by multplyng the source of nformaton geographcally, whch n turn reduce both the access latency and the network traffc. In addton to ths, due to the ntrnsc dynamsm of the P2P envronment, statc data dstrbuton cannot be expected to guarantee good load balancng. If the hot peers become bottleneck, t leads to ncreased user response tme and sgnfcant performance degradaton of the system. Hence an effectve load balancng mechansm s necessary n such cases and t can be attaned effcently by ntellgent data replcaton. In ths paper, we propose a cluster based replcaton archtecture for load-balancng n peer-to-peer content dstrbuton systems. In addton to an ntellgent replca placement technque, t also conssts of an effectve load balancng technque. In the ntellgent replca placement technque, peers are grouped nto strong and weak clusters based on ther weght vector whch comprses avalable capacty, CPU speed, access latency and memory sze. In order to acheve complete load balancng across the system, an ntracluster and nter-cluster load balancng algorthms are proposed. We are able to show that our proposed archtecture attans less latency and better throughput wth reduced bandwdth usage, through the smulaton results. Keywords Replca, Overlay, Clusters, QoS, Content, Routng 1. Introducton P2P Overlay Networks To share the computer resources lke content, storage, CPU cycles drectly wthout usng an ntermedate system or a centralzed server, dstrbuted computer archtecture, called peer-topeer are desgned. They are dstngushed by ther falure adaptaton capabltes and mantenance of acceptable connectvty and performance [1]. Sgnfcant research attenton has been appled to Content dstrbuton, whch s an mportant peer-to-peer applcaton on the DOI : 10.5121/jcnc.2010.2510 158

Internet. By allowng personal computers to work as a dstrbuted storage medum, they normally contrbute, search and obtan dgtal content. Overlays are flexble and deployable approaches that allow users to perform dstrbuted operatons wthout modfyng the underlyng physcal network. Peer-to-peer (P2P) overlay systems have been proposed to address a varety of problems and enable new applcatons. The attracton of these systems, when compared to clent/server frameworks, s n ther robustness, relablty and cost effcency. Unlke tradtonal dstrbuted computng, P2P networks aggregate large number of computers and possbly moble or handheld devces, whch jon and leave the network frequently. Nodes n a P2P network, called peers, play a varety of roles n ther nteracton wth other peers. When accessng nformaton, they are clents. When servng nformaton to other peers, they are servers. When forwardng nformaton for other peers, they are routers. Ths new breed of systems creates applcaton-level vrtual networks wth ther own overlay topology and routng protocols. To search for data or resources, messages are sent over multple hops from one peer to another wth each peer respondng to queres for nformaton t has stored locally. Structured P2P overlays mplement a dstrbuted hash table data structure n whch every data tem can be located wthn a small number of hops at the expense of keepng some state nformaton locally at the nodes. Replca Placement for QoS-Aware Content Dstrbuton Replcaton technques are wdely employed to mprove the avalablty of data, enhancng performance of query latency and load balancng, n content dstrbuton systems. We can geographcally multply the source of nformaton by dstrbutng multple copes of data n the network. By forwardng each query to ts nearest copy, the query search latency can be effectvely reduced. The ablty to mprove system scalablty through dstrbutng the load across multple servers [2] s also offered by replcaton. If a replca of the requested object (e.g., a web page or an mage) s kept n ts nearer proxmty then the clents would feel low access latency. Dependng on the poston of the replcas, the effectveness of replcaton tends to a large extent. The centralzed servers become a bottleneck as the requrement of the nformaton ncreases. The performance problem s managed by the content provders, system admnstrators or end users by themselves through delverng replcas of web content to machnes, spread throughout the network. The load on the central server [3] s reduced by replcas through respondng to the local clent requests. The load whch s delvered to the cooperate nodes ncludes: Communcaton bandwdth, for sendng the data to the requestng content, Storage used for hostng the replca and CPU resources for query processng. The problem of decdng how many replcas s to be delvered to each fle and ts locaton s gven by the Replca management to ths crcumstances. To handle more requrements for each fle, enough replcas should be present. Servers become overloaded and clents observe lower performance by havng only few replcas. On the other hand the waste bandwdth of extra replcas 159

and the storage whch could be reassgned to the other fles, and also the money spent to rent, power and also for host machne coolng. Load Balancng In P2P systems, large volumes of data are declustered naturally across a large number of peers. But t s very dffcult to control the ntal data dstrbuton because every user has the freedom to share any data wth other users. In addton to ths, due to the ntrnsc dynamsm of the P2P envronment, statc data dstrbuton cannot be expected to guarantee good load balancng. In some of the hot peers, the number of dsk accesses s unequal because of changng the populartes of varous data tems and skewed query patterns. Therefore ths causes severe load mbalance throughout the system. If the hot peers become bottleneck, t leads to ncreased user response tme and sgnfcant performance degradaton of the system. Hence the load balancng mechansm s necessary n such cases and t can be attaned effcently by onlne data mgraton/replcaton. In ths paper, we propose a cluster based replcaton archtecture for load-balancng n peerto-peer content dstrbuton systems. It contans an ntellgent replca placement algorthm wth an effectve load balancng technque. Ths paper s an extenson of our prevous work [18]. Ths paper s organzed as follows. Secton 2 gves the detaled related work done. Secton 3 presents the system model and algorthm overvew for the proposed archtecture. Secton 4 presents the ntellgent replca placement algorthm, followed by the searchng technque. Secton 5 descrbes the load balancng technque n detal. Secton 6 gves the expermental results and secton 7 concludes the paper. 2. Related Works Most of the research efforts to mprove the performance of Gnutella-lke P2P systems can be classfed nto two categores: 1) P2P search and routng algorthms and 2) P2P overlay topologes. Most of the proposed routng or search algorthms n the frst category, dsregard the natural peer heterogenety present n most P2P systems and more mportantly the potental performance hurdle caused by the randomly constructed overlay topology. B. Mortazav_ and G. Kesds [4] have provded a survey of reputaton systems. Based on a reputaton framework, they have desgned a game n whch users play to maxmze the receved fles from the system. For ths, the users adjust ther cooperaton level, there by obtanng a good reputaton as a result. Brghten Godfrey et al [5] have proposed an algorthm for load balancng n heterogeneous and dynamc P2P systems. Ther smulaton results shows that n the face of rapd arrvals and departures of objects of wdely varyng load, ther algorthm acheves load balancng for system utlzatons as hgh as 90% whle movng only about 8% of the load that arrves nto the system. Smlarly, n a dynamc system where nodes arrve and depart, ther algorthm moves less than 60% of the load the underlyng DHT moves due to node arrvals and departures. Fnally, they have shown that ther dstrbuted algorthm performs only neglgbly worse than a smlar 160

centralzed algorthm and that node heterogenety helps, not hurts, whch s the scalablty of ther algorthm. Kalman Graff et al [6] have proposed a DHT-based nformaton gatherng and analyzng archtecture whch controls the streamng request assgnment n the system and thoroughly evaluate t n comparson to a dstrbuted stateless strategy. They evaluated the mpact of the key parameters n the allocaton functon whch consders the capabltes of the nodes and ther contrbuton to the system. Identfyng the qualty-bandwdth tradeoffs of the nformaton gatherng system, they llustrate that wth ther proposed system a 53% better load balancng can be reached and the effcency of the system s sgnfcantly mproved. Paraskev Raftopoulou and Eurpdes G.M. Petraks have presented Cluster, a self-organzng peer-to-peer overlay network for supportng full-fledged nformaton retreval n a dynamc envronment. They defned the crtera for peer smlarty and peer selecton, and also presented the protocols for organzng the peers nto clusters and for searchng wthn the clustered organzaton of peers [7]. Unfortunately, most exstng work on replca placement has focused on optmzng an average performance measure of the entre clent communty such as the mean access latency [8], [9]. Whle an average performance measure may be mportant from the system s pont of vew, t does not dfferentate the lkely dverse performance requrements of the ndvduals. So far, to the best of our knowledge, there has been no study on QoS-aware replca placement. Carvalho, N. Araujo, F. Rodrgues. L, have presented the IndQoS archtecture, a scalable QoS-aware publsh-subscrbe system wth QoS-aware publcatons and subscrptons that preserves the decouplng whch makes the publsh-subscrbe model so appealng. To support such model, the proposed archtecture IndQoS ncludes a decentralzed message-broker based on a DHT that leverages on underlyng network-level QoS reservaton mechansms [10]. Gullaume Perre and Maarten van Steen have presented Globule, a collaboratve content delvery network. The Proposed network was composed of Web servers that cooperate across a wde-area network to provde performance and avalablty guarantees to the stes they host [12]. Davd Novak [14] suggested a new general soluton of the load-balancng problem n P2P Data Networks, whch s especally sutable for systems wth tme consumng search operatons. The proposed framework analyzes the source of the load precsely to choose rght balancng acton. The scalablty and performance of DHTs s strongly based on an equal dstrbuton of data across partcpatng nodes. Because ths concept s based on hash functons, one assumes that the content s dstrbuted nearly evenly across all DHT-nodes. Nonetheless, most DHTs show dffcultes n load balancng as we wll pont out n ths paper. To ensure the major advantages of DHTs namely scalablty, flexblty and reslence Smon Reche et al [15] have dscussed three approaches of load balancng and compare them correspondng to smulaton results. Theon Ptoura et al [16] have presented Hot-RoD, a DHT-based archtecture that deals effectvely wth ths combned problem through the use of a novel localty-preservng hash functon, and a tunable data replcaton mechansm whch allows tradng off replcaton costs for 161

far load dstrbuton. Ther detaled expermentaton study shows strong gans n both range query processng effcency and data-access load balancng, wth low replcaton overhead. Ananth Rao et al [17] have addressed the problem of load balancng n P2P systems. They explored the space of desgnng load-balancng algorthms whch uses the noton of vrtual servers. They have presented three schemes that dffer prmarly n the amount of nformaton used to decde how to re-arrange load. Ther smulaton result shows that even the smplest scheme s able to balance the load wthn 80% of the optmal value, whle the most complex scheme s able to balance the load wthn 95% of the optmal value. Song Fu et al [18] characterzed the behavors of randomzed search schemes n the general P2P envronment. They extended the supermarket model by nvestgatng the mpact of node heterogenety and churn to the load dstrbuton n P2P networks. They have proved that by usng d-way random choces schemes, the length of the longest queue n P2P systems wth heterogeneous nodal capacty and node churn for d 2 s clog logn/logd + O(1) wth hgh probablty, where c s a constant. 3. System Model and Algorthm Overvew Algorthm Overvew In our QOS aware topology, nodes are grouped nto strong and weak clusters based on ther weght vector whch comprses the followng parameters: Avalable capacty CPU speed Memory sze Access Latency In the replca placement algorthm, we classfy the content as Class I and Class II, based on ther access patterns. (.e.) The most frequently accessed contents are ranked as Class I and the less frequently accessed contents as Class II. Then more copes of Class I content are replcated n strong clusters (havng hgh weght values). Routng s performed herarchcally by broadcastng the query only to the strong clusters. Thus our proposed archtecture acheves Low bandwdth Consumpton, Reduced Latency, Reduced Mantenance Cost, Strong Connectvty and Query Coverage. System Model Let us consder a collecton of N server nodes whch form a peer to peer (P2P) overlay network. In addton to beng part of the overlay, each node functons as a server respondng to requests (queres) whch come from clents outsde of the overlay network. An example could be that each node s a web server wth the overlay lnkng the servers and clents beng web browsers on remote machnes requestng content from the servers. We assume each node always stores one copy of ts own content tem whch t serves to clents and that t has addtonal storage space to store k replcated content tems from other nodes whch t can also serve [3]. The object s assocated wth an authortatve orgn server (OS) n the network where the content provder makes the updates to the object. The object copy located at 162

the orgn server s called the orgn copy and an object copy at any remanng server s called a replca. 4. Intellgent Replca Placement Algorthm Clusterng the Nodes For each node N, = 1,2...n, let BW - Avalable Bandwdth SP - CPU Speed AL - Access Latency MZ - Memory Sze 1. The weght of the node N can be calculated as ( BW + SP + MZ W ) = AL 2. Form the vector W = { S, W}, whch denotes the node ds and ther correspondng weght values, sorted on the descendng order. 3. Let {Sk} denote the set of strong cluster nodes (0 < = k < n), whch satsfes the followng condtonw β, where β s the mnmum threshold value for the weght. k 4. Then the set {Wj} = {N} {Sk}, denote the set of weak cluster nodes (0 < = j < n), whch satsfes the condton W k < β Replca Placement Let QS be the query server whch regsters the query of each clent. The query server stores the cluster nformaton of each node along wth the node d as S or W for strong and weak clusters, respectvely. 1. At tme Tk, let m clents generates query requests {Qm} of the form q {nd, ckwd}, where nd s the node d of the clent and ckwd s the keyword of the content to be retreved. 2. The queres {Qm} are regstered n the query server QS. 3. The requested content of the queres are classfed and categorzed as class1 or class2, dependng on the access frequences. (.e.) A query Qj, j<m, s consdered to be class1 If n (Qj) >= Amn and class2, If n (Qj) < Amn Where n (Qj) s the no. of access of the content pattern for the gven query and Amn s the mnmum access threshold value. 4. Then the query server QS assgns the class1 contents to the strong cluster nodes and class2 contents to the weak cluster nodes. 163

5. After the assgnment, QS transmt these replcaton pattern nformaton to the orgn server OS. 6. OS performs the replcaton placement, accordng to the pattern nformaton obtaned from QS. The weght value W of each node s stored along wth the content. 7. OS then broadcasts the replcaton nformaton to the respectve clents n the followng format {Nd, Cld ( S or W ), c1, c2 } Where Nd s the node d, Cld s the cluster d and c1, c2 are content database ds. 5. Load Balancng Through Replcaton In ths secton, we present the ntra-cluster and nter-cluster load balancng through the replcaton of data. Here, balancng the load wthn a partcular cluster s called as ntra-cluster load balancng and balancng the load among the clusters s called as nter-cluster load balancng. Ths s done n order to acheve complete load balancng across the system. Replcaton Constrants Load balancng can be attaned through data replcaton by transferrng hot data from heavly loaded peers to lghtly loaded peers. Snce the search s entrely dstrbuted, a partcular replca of a specfed data tem D s accessed for large number of tmes rather than usng many replcas. Thus t does not provde absolute guarantee of load balancng. Even though there s a cost of dsk space, replcaton ncreases the data avalablty. Snce the data whch was hot prevously may become cold subsequently, a perodc cleanup of the replcas s necessary. Therefore ths shows that the replcas are no longer needed. In addton to ths, ssues of replcatng large data tems are need to be examned. Bascally, our man objectve s to make sure that the replcaton executed for short-term beneft does not cause long-term degradaton n system performance by causng unwanted wastage of valuable dsk space at the peers. We propose that the run-tme decson for both ntra-cluster and nter-cluster whch nvolves replcaton should be made as follows: Every cluster leader observes t peer s avalablty over a perod of tme. The hot data should be replcated for avalablty reasons f the probablty of a peer P 1 leavng the system s hgh. It s to be notced that the replcaton wll be done only f t s subjected to dsk space constrants at the destnaton peers. In addton to ths, f the dsk capactes of the peers are larger than that of the sze of the large data tems then the large data tems shall be replcated. Each peer P mantans the set of data tems D replcated at t. However, to determne the data tems whch are stll hot, P checks perodcally the number of accesses N k for the last tme nterval on each tem n D. The tems for whch N k s less than a predefned threshold α are deleted snce those tems may not be hot anymore. Thus, t elmnates the need for replcaton. Consder the hot data tems are numbered as H 1, H 2, H 3 H m (H 1 s the hottest element) n a cluster. The orgnal copy of these replcas s stored at the peers n whch started the replcaton and t shows that the orgnal data tem s not deleted. To provde the system scalablty over tme, t s mportant to delete the replcas perodcally, whch yelds more dsk space. 164

Intra-Cluster Load Balancng In case of ntra-cluster load balancng, some of the decsons are crtcal to system performances regardng when to trgger the load balancng mechansm, hotspot detecton and the amount of data to be replcated. In ths approach, the workload statstcs of each peer s sent to ts cluster leader perodcally. Load balancng s started when the cluster leader detects a load mbalance n the cluster. In the conventonal domans, ntra-cluster load balancng are researched perfectly. But for P2P systems, we should nclude the changng avalable dsk capactes of the peers. The cluster leader CL receves the nformaton perodcally regardng the loads W and avalable dsk space S of the peers. Based on W the cluster leader CL creates a sorted lst l of the peers such that the frst element of the lst s the heavly loaded peer. Let us consder that there are n elements n the lst. Among the last [n/2] peers n the lst, the peers whose correspondng values of S whch are less than a pre-specfed threshold S th are deleted. Now the load balancng s acheved by replcatng the hot data H from the frst peer n the lst to the last peer and the second peer to the second last peer and so on. If the load dfference between the peers exceeds a pre-specfed threshold β, then the data wll be replcated. We can montor that, only for a partcular perodc tme ntervals CL checks for the load mbalance and not whenever any peer jons/leaves the system. CL corrects the load mbalances whch are caused by some peers whle jonng/leavng the system. These are done only at the next perodc tme ntervals. We trust that whle performng load balancng every tme, a peer jons/leaves wll results n dsastrous condton because peers may jon/leave the system frequently. The above steps are summarzed n the followng algorthm. Algorthm Intra Cluster Load Balancng 1. For each {CL } k =1 2. For each member {Pj} n j=1 of CL 3. P j, send Wj, and S j, to CL 4. CL add P j, to the lst {l } 5. End For 6. CL sort l such that W j, > W j-1, > W j-2,.. 7. For each {l j } n j=n/2 8. If S j, < S th then 9. Delete the element P j, 10. End f 11. End For 12 If Wa W b > β for any a, b < n, then 13. Move H1 (N 1 ) nto N n. H2 (N 2 ) nto N n-1 and so on. 14. End f 15. End For 16. End 165

Inter Cluster Load Balancng Inter cluster load balancng s necessary n order to prevent load mbalance among the clusters. We propose that such load balancng should be carred out between the neghborng clusters through cooperaton between the cluster leaders. Ths s because movng data to dstant clusters may obtan hgh communcaton overhead to algn the movement. The load nformaton s exchanged from the cluster leaders wth ther neghborng cluster leaders perodcally. The cluster leader CL k checks whether ts load exceeds the average load of the set {CL } of ts neghborng cluster leaders by more than 10% of the average load. If t exceeds, then t determnes the hot data tems whch should be moved. It sends a message about the dsk space requrements of each data tems to each cluster leader n {CL } to transfer some part of ts load to them. The leaders n {CL } check the avalable dsk space n each of ther cluster members. They send a message to CL k about the total loads and ther avalable dsk space f the space lmts are satsfed. Therefore CL k arranges the leaders whch are ready n {CL } to the Lst l k so that the frst element of the Lst l k s the least loaded leader. We assume that r denotes the number of wllng peers n {CL } and m denotes the number of hot data tems. If r < m, then H 1 s assgned to the frst element of l k and H 2 s assgned to the second element and so on n a round-robn fashon. Ths s done untl all the hot tems have been assgned. Suppose f r m, then the assgnment of hot data to elements of l k s performed n the same way as above. But n ths case some elements of l k wll not receve any hot data. Once the hot data arrved at the cluster leader CL, then the leader creates a sorted lst l n descendng order of load of ts peers. Then usng the ntra-cluster load balancng, the cluster leader assgns the hot data to the elements of l.the above steps are summarzed n the followng algorthm Algorthm- Inter Cluster Load Balancng 1. Consder a cluster leader CL k. 2. CL k exchanges {W} wth {CL } n =1 3. If (Wk - W avg) > (W avg * 0.10) Then (W avg s the average load of {CL } n =1 and Wk s the load of CL k ) 4. For each member {Pj} n j=1 of CL k 5. CL k send S j to {CL } n =1 6. End For 7. For each {CL } n =1 8. For each member {P v } n v=1 of CL 9. If S v, > Mn ({S j } n j=1) Then 10. Send S v, and W v, to CL 11. End f 12 End For 13. CL sends W v, and S v, to CL k 14. CL k add CL to the lst {l k } 15. End For 16. CL k sort l k such that W v, < W v+1, < W v+2,. 17. If r < m Then 166

18. For each H of CL k 19 Move H1 nto CL1, H2 nto CL2 and so on. 20 End For 21 End f 22 Apply Intra-Cluster load balancng to {CL } n =1 23 End f 24 End. 6. Expermental Results Smulaton Setup Ths secton deals wth the expermental performance evaluaton of our algorthms through smulatons. In order to test our protocol, the NS2 smulator s used. NS2 s a general-purpose smulaton tool that provdes dscrete event smulaton of user defned networks. We have used the Bt Torrent packet-level smulator for P2P networks [13]. A network topology s only used for the packet-level smulator. Based on the assumpton that the bottleneck of the network s at the access lnks of the users and not at the routers, we use a smplfed topology n our smulatons. We model the network wth the help of access and overlay lnks. Each peer s connected wth an asymmetrc lnk to ts access router. All access routers are connected drectly to each other modelng only an overlay lnk. Ths enables us to smulate dfferent upload and download capactes as well as dfferent end-to-end (e2e) delays between dfferent peers. Smulaton Results Fg. 1: Topology of P2P overlay network We have smulated our Cluster Based Replcaton archtecture wth load balancng (WthLB) and wthout load balancng (WthoutLB) and measure the throughput, delay and packet loss. Based On Load In our ntal experment, the load of the requested content s vared from 250bytes to 2000bytes. The response delay and receved throughput are measured. In Fgure 2, we can see that, when the load ncreases, the delay also ncreases. It s evdent that the delay of LB s sgnfcantly less than the delay of WthoutLB. Fgure 3 shows the aggregated throughput of all the clent nodes whch obtaned ther respectve share of data. From the fgure we can see that the LB has more throughput than WthoutLB. 167

PacketSze Vs Delay delay(s) 0.06 0.05 0.04 0.03 0.02 0.01 0 250 500 1000 1500 2000 psze(bytes) WthoutLB WthLB Fg. 2: Load Vs Delay PacketSze Vs Throughput Throughput(Mb/s) 0.3 0.25 0.2 0.15 0.1 0.05 0 250 500 1000 1500 2000 psze(bytes) WthoutLB WthLB Based On Rate Fg. 3: Load Vs Throughput In our second experment, the query sendng rate s vared from 250Kb to 1Mb. The response delay and receved throughput are measured. In Fgure 4, we can observe that, when the rate ncreases, the delay remans almost constant for WthoutLB but decreases n the case of LB. From the fgure, t can be seen that the delay of LB s sgnfcantly less than the delay of WthoutLB. 168

Rate vs Delay 0.01 0.008 delay(s) 0.006 0.004 0.002 WthoutLB WthLB 0 250 500 750 1000 rate(kb) Fg. 4: Rate Vs Delay In Fgure 5, the throughput aganst rate s shown. From the fgure, we can see that the throughput of LB s more when compared to WthoutLB, and ncreases when rate ncreases. Rate Vs Throughput Throughput(Mb/s) 1.2 1 0.8 0.6 0.4 0.2 0 250 500 750 1000 WthoutLB WthLB rate(kb) Fg. 5: Rate Vs Throughput Based on Smulaton Tme In our last experment, the smulaton tme s vared from 10 to 20 seconds. The response delay, packets lost and receved throughput are measured. In Fgure 6, the throughput aganst tme s shown. From the fgure, we can see that the throughput of LB s more when compared to WthoutLB, and remans constant when tme ncreases. In Fgure 7, we can see that the delay of LB s sgnfcantly less than the delay of WthoutLB. The number of packets lost s shown n Fgure 8. As the tme ncreases, the packet lost also ncreases n the case of WthoutLB. For LB, ultmately there s no packet loss. 169

Tm e Vs Throughput Throughput(Mb/s) 1.2 1 0.8 0.6 0.4 0.2 0 12 14 16 18 20 WthoutLB WthLB tm e(s) Fg. 6: Tme Vs Throughput Tme Vs Delay 0.01 delay(s) 0.008 0.006 0.004 0.002 0 12 14 16 18 20 WthoutLB WthLB tm e(s) Fg. 7: Tme Vs Delay Tme Vs Packet Lost 300 packets lost 200 100 0 12 14 16 18 20 WthoutLB WthLB tm e(s) Fg.8: Tme Vs Packet Lost 7. Concluson In ths paper, we have proposed a cluster based replcaton archtecture for load-balancng n peer-to-peer content dstrbuton systems. Based on the weght vector whch ncludes avalable capacty, CPU speed, and memory sze and access latency the nodes are classfed nto strong and weak clusters. Based on the access pattern the content s classfed nto class I or class II by the 170

replca management algorthm. Then class I contents are replcated nto strong groups for more copes. Routng s performed only to the strong clusters through broadcastng the query herarchcally. In addton to an ntellgent replca placement technque, t also conssts of an effectve load balancng technque. In the ntellgent replca placement technque, peers are grouped nto strong and weak clusters based on ther weght vector whch comprses avalable capacty, CPU speed, memory sze and access latency. In order to acheve complete load balancng across the system, an ntra-cluster and nter-cluster load balancng algorthms are proposed. We have shown that our proposed archtecture attans less latency and better throughput wth reduced bandwdth usage, through the smulaton results. References [1] Stephanos, Routsells-Theotoks and Domds Spnells, A Survey of Peer-to-Peer Content Dstrbuton Technologes, ACM Computng Surveys, Vol. 36, No. 4, December 2004, pp. 335 371. [2] Xueyan Tang Janlang Xu On Replca Placement for QoS-Aware Content Dstrbuton INFOCOM 2004. Twenty-thrd AnnualJont Conference of the IEEE Computer and Communcatons Socetes, 7-11 March 2004. [3] Davd Hales, Andrea Marcozz and Govann Cortese,"Towards Cooperatve, Self-Organsed Replca Management", n proc. of Frst IEEE Internatonal Conference on Self-Adaptve and Self- Organzng Systems, pp: 367-370, 9-11 July 2007, Do:10.1109/SASO.2007.62. [4] B. Mortazav and G. Kesds, "Cumulatve Reputaton Systems for Peer-to-Peer Content Dstrbuton", n proc. of IEEE Annual Conference on Informaton Scences and Systems, pp:1546-1552, 22-24 March 2006, Do: 10.1109/CISS.2006.286385. [5] Brghten Godfrey Karthk Lakshmnarayanan Sonesh Surana Rchard Karp Ion Stoca, Load Balancng n Dynamc Structured P2P Systems, IEEE INFOCOM 2004. [6] Kalman Graff, Sebastan Kaune, Konstantn Pussep, Aleksandra Kovacevc, Ralf Stenmetz, Load Balancng for Multmeda Streamng n Heterogeneous Peer-to-Peer Systems NOSSDAV 08 Braunschweg, Germany. [7] Paraskev Raftopoulou and Eurpdes G.M. Petraks, "Cluster: a Self-Organzng Overlay Network for P2P Informaton Retreval", n proc. of ECIR, 30 March - 3 Aprl 2008. [8] P. Krshnan, D. Raz, and Y. Shavtt, The cache locaton problem, IEEE/ACM Transactons on Networkng, vol. 8, no. 5, pp. 568 582, Oct. 2000. [9] L. Qu, V. N. Padmanabhan, and G. M. Voelker, On the placement of web server replcas, n Proceedngs of IEEE INFOCOM 01, Apr. 2001, pp. 1587 1596. [10] Carvalho, N. Araujo, F. Rodrgues, L. Scalable QoS-Based Event Routng n Publsh-Subscrbe Systems Network Computng and Applcatons, Fourth IEEE Internatonal Symposum on Publcaton Date: 27-29 July 2005. [11] Gullaume Perre and Maarten van Steen, "Globule: A Collaboratve Content Delvery Network", n proceedngs of IEEE Communcaton Magazne, vol.44, no.8, August 2006, Do:10.1109/MCOM.2006.1678120. [12] Kolja Eger,Tobas Hoßfeld, Andreas Bnzenhofer,"Effcent Smulaton of Large-Scale P2P Networks:Packet-level vs. Flow-level Smulatons", n proceedngs of 2nd Workshop on the Use of P2P, GRID and Agents for the Development of Content Networks, pp: 9-16, June 2007. [13] Davd Nov ak, Load Balancng n Peer-to-peer Data Networks, Masaryk Unversty, Brno, Czech Republc, Oct 2006. 171

[14] Smon Reche, Leo Petrak, Klaus Wehrle. Comparson of Load Balancng Algorthms for Structured Peer-to-Peer Systems Protocol-Engneerng and Dstrbuted Systems Wlhelm- Schckard-Insttute for Computer Scence, Unversty of T ubngen, Sep 2004. [15] Theon Ptoura, Nkos Ntarmos, Peter Trantafllou, Replcaton, Load Balancng and Effcent Range Query Processng n DHTs, Research Academc Computer Technology Insttute, and Computer Engneerng and Informatcs Department, Unversty of Patras, Greece. [16] Ananth Rao Karthk Lakshmnarayanan Sonesh Surana Rchard Karp Ion Stoca, Load Balancng n Structured P2P Systems, Elsever Scence Publshers B. V. Amsterdam, The Netherlands, volume 63, Issue 3, March 2006. [17] Song Fu, Cheng-Zhong Xu, Hayng Shen, Random Choces for Churn Reslent Load Balancng n Peer-to-Peer Networks, In Proceedngs of the 22nd ACM/IEEE Internatonal Parallel and Dstrbuted Processng Symposum (IPDPS), 2008. [18] S.Ayyasamy and Dr.S.N.Svanandam, A QoS-Aware Intellgent Replca Management Archtecture for Content Dstrbuton n Peer-to-Peer Overlay Networks, Internatonal Journal on Computer Scence and Engneerng, Vol. 1(2), pp. 71-77, 2009. About the Authors Mr.S.Ayyasamy completed hs B.E. (Electroncs and Communcaton Engneerng) n 1999 from Maharaja Engneerng College and M.E. (Computer Scence and Engneerng) n 2002 from PSG College of Technology, both under Bharathar Unversty, Combatore. Currently he s pursung PhD degree from Anna Unversty, Combatore. He s workng as an Assstant Professor, Department of Informaton Technology at Tamlnadu College of Engneerng, Combatore. He s a member of varous professonal bodes lke ISTE, CSI and IAENG. Hs research areas nclude P2P networks, Overlay Networks, Load Balancng and Qualty of Servces and havng 9 years of teachng experence n Engneerng Colleges. Dr. S. N. Svanandam completed hs B.E. (Electrcal Engneerng) n 1964 from Government College of Technology, Combatore, and MSc (Engneerng) n Power Systems n the year 1966 from PSG College of Technology, Combatore. He acqured PhD n control systems n 1982 from Madras Unversty. He receved best teacher award n the year 2001 and Dhakshna Murthy Award for teachng excellence from PSG College of technology. He receved the ctaton for best teachng and techncal contrbuton n the year 2002, Government College of Technology, Combatore. Hs research areas nclude Modelng and Smulaton, Neural Networks, Fuzzy Systems and Genetc Algorthm, Pattern Recognton, Multdmensonal system analyss, Lnear and Non lnear control system, Sgnal and Image processng, Control System, Power System, Numercal methods, Parallel Computng, Data Mnng and Database Securty. He s a member of varous professonal bodes lke IE (Inda), ISTE, CSI, ACS and SSI. He s a techncal advsor for varous reputed ndustres and engneerng nsttutons. 172