VIDEO REPLICA PLACEMENT STRATEGY FOR STORAGE CLOUD-BASED CDN



Similar documents
6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC

Green Master based on MapReduce Cluster

Average Price Ratios

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

APPENDIX III THE ENVELOPE PROPERTY

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

The impact of service-oriented architecture on the scheduling algorithm in cloud computing

A Parallel Transmission Remote Backup System

Report 52 Fixed Maturity EUR Industrial Bond Funds

Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

AnySee: Peer-to-Peer Live Streaming

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Optimal Packetization Interval for VoIP Applications Over IEEE Networks

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

Automated Event Registration System in Corporation

Numerical Methods with MS Excel

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

How To Make A Supply Chain System Work

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Resource Management Model of Data Storage Systems Oriented on Cloud Computing

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

A particle swarm optimization to vehicle routing problem with fuzzy demands

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

Load Balancing Algorithm based Virtual Machine Dynamic Migration Scheme for Datacenter Application with Optical Networks

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

Speeding up k-means Clustering by Bootstrap Averaging

Impact of Interference on the GPRS Multislot Link Level Performance

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

How To Balance Load On A Weght-Based Metadata Server Cluster

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Efficient Traceback of DoS Attacks using Small Worlds in MANET

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

CHAPTER 2. Time Value of Money 6-1

Performance Attribution. Methodology Overview

10.5 Future Value and Present Value of a General Annuity Due

Low-Cost Side Channel Remote Traffic Analysis Attack in Packet Networks

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Classic Problems at a Glance using the TVM Solver

Fault Tree Analysis of Software Reliability Allocation

Chapter Eight. f : R R

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Optimization Model in Human Resource Management for Job Allocation in ICT Project

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Banking (Early Repayment of Housing Loans) Order,

Integrating Production Scheduling and Maintenance: Practical Implications

Web Service Composition Optimization Based on Improved Artificial Bee Colony Algorithm

Simple Linear Regression

Study on prediction of network security situation based on fuzzy neutral network

Settlement Prediction by Spatial-temporal Random Process

of the relationship between time and the value of money.

Software Reliability Index Reasonable Allocation Based on UML

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

1. The Time Value of Money

Approximation Algorithms for Scheduling with Rejection on Two Unrelated Parallel Machines

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

On Error Detection with Block Codes

Statistical Intrusion Detector with Instance-Based Learning

AP Statistics 2006 Free-Response Questions Form B

Reinsurance and the distribution of term insurance claims

Entropy-Based Link Analysis for Mining Web Informative Structures

The Popularity Parameter in Unstructured P2P File Sharing Networks

Transcription:

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 VIDEO REPICA PACEMENT STRATEGY FOR STORAGE COUD-BASED CDN 1 SHIJIA. YAO, 2 WENE ZHOU, 3 HAOMIN CUI AND, 4 MING. ZHU 1,2,3,4 Departmet of Automatc of Uversty of Scece ad Techology of Cha, Hefe, Cha Emal: 1,2,3 { yaos,chhmm,wlzhou}@mal.ustc.edu.c, 4 mzhu@ustc.edu.c ABSTRACT The ole vdeo servce eed the support of CDN(Cotet Delvery Networs). Compared wth tradtoal CDNs, t ca save a lot of cost by usg cloud-based storage odes to delver the vdeo cotet. To guaratee ed users QoS, CDN should pre-deploy the cotet fles of ole vdeo servce to the edge odes whch are close to the users. Exsted researches have show that the cost of buldg CDN by cloud storage odes s much less tha that of usg tradtoal CDNs. The exsted off-le replca placemet algorthm amed GS(Greedy Ste) ca meet the QoS requremet wth relatvely small cost whe the formato of users requests s provded. However GS wll result bad load balace ad t eed the formato of users requests. I ths paper, two classes of offle algorthms are proposed. Oe amed GUCP(Greedy User Core Preallocato) effectvely solved the load mbalaced problem caused by GS,ad the other oe amed PBP(Popularty Based Placemet) whch s based o the popularty of cotet effectvely placed replcas whle there s o users requests formato. Numercal expermets have demostrated the effectveess of the algorthms above. Keywords: Cloud storage, CDN, Replca Placemet, oad balace,qos. 1. INTRODUCTION Wth the rapd developmet of ole vdeo servce, there has bee a large umber of small compaes of ole vdeo servce. The vdeo ole servce eeds the support of CDN(Cotet Delvery Networs). Tradtoal CDNs such as Aama ad Mrror Image have deployed tes of thousads of data ceters ad edge servers to delver cotet across the globe. Ufortuately, the prce of tradtoal CDNs(such as Aama) s so much hgher tha small orgazatos such as medum-szed eterprses, govermet ageces, uverstes, ad chartes [1].The prce of buldg CDNs or hrg exstg CDNs s much hgher tha the ablty of face of medum-szed vdeo servce provder. As a result, the dea of utlzg storage clouds as a poor ma s CDN s very etcg. The cloud storage provders promse the ablty of rapd, cheap readg ad wrtg ad are easy to be expaded to meet flash crowds of web stes. Ecoomes of scale, terms of cost effectveess ad performace for both provders ad ed users, ca be acheved by leveragg exstg storage cloud frastructure, stead of vestg large amouts of moey ther ow cotet delvery platform or utlzg oe of the cumbet operators le Aama [2]. The recet emergece of storage cloud provders such as Amazo S3, Nrvax ad Racspace has opeed up ew opportutes to provde cost-effectve CDNs. Storage cloud provders operate data ceters that ca offer Iteret-based cotet storage ad delvery capabltes wth the assurace of servce uptme ad ed user perceved servce qualty. Servce qualty s typcally the form of badwdth ad respose tme guaratees [3]. Utlzg storage cloud buldg CDNs ca effectvely reduce the cost of cotet storage ad delvery. I rest of ths paper, the based-cloud storage CDN s called cloud CDN for short.. It s dffcult to use multple cloud storage to provde servce of CDN, because each cloud storage provders offers dfferet Web servces or programmer APIs ad each servce s best utlzed va uque Web servces or programmer APIs ad has ther ow uque qurs. May Web stes have utlzed dvdual storage clouds to delver some or all of ther cotet [4], most otably the New Yor Tmes [5] ad SmugMug [6], however, there s o geeral-purpose, reusable framewor to teract wth multple storage cloud provders ad leverage 61

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 ther servces as a cotet delvery etwor. Most storage cloud provders ust provde basc fle storage ad delvery servces. But do ot offer the capabltes of a typcally CDN such as automatc replcato, fal-over, geographcal load redrecto, ad load balace. Furthermore, a customer may eed coverage more locatos tha offerg by a sgle provder. The MetaCDN s a system that utlzes umerous cloud storage provders order to create a overlay etwor that ca be used as a hgh-performace, relable, ad redudat geographcally dstrbuted CDN to solve these problem [7]. Storage cloud provders charge ther customers by ther storage ad badwdth usage followg the utlty computg model [8]. Storage cost s measured per GB per ut tme ad badwdth cost s measured per GB trasferred. Badwdth cost cossts of upload cost for comg data ad dowload cost for outgog data. The costumers of storage cloud, cloud CDN are accustomed to utlze dfferet cloud storage provders order to reduce the cost. Because cloud storage ca be scaled o-demad, cloud CDN ca be easly adusted accordg to demad. Cloud CDN ca offer multple resources to multple customers as tradtoal CDN. I other words, cloud CDN ca provde servce such as tradtoal CDN, but wthout matag or owg ay frastructure. The fle of ole vdeo servce s very large, the respose tme should be as short as possble as possble. The replcas should be placed o the edge odes, whch are earest to the users. I ths paper, two ds of off-le replca placemet algorthms for cloud CDN whch provdes servce for ole vdeo servce were proposed. Oe algorthm amed GUCP could be used to place replcas whe cloud CDN has users ad requests from them. Ths algorthm could solve the load mbalace problem whch s caused by GS. The other algorthm amed PBP could used to place replcas whle there s o formato of users requests cloud CDN. 2. REATED WORKS The replca placemet problem tradtoal CDNs refers to fdg the best set of servers to place cotet replcas. Replca placemet belogs to the NP-complete class of problems [9]. The CDN performace ca be affected by decsos such as: 1. The umber of surrogates requred. 2. Ther locato. 3. The cost model adopted cludg storage cost, cotet retreval cost, ad cotet updatg cost. 4. QoS cosderatos. A cosderable amout of research has bee doe for replca placemet CDNs. The cost model has evolved to clude oe or more of the three types of costs: retreval (or dowload), storage ad update (or upload) cost. I terms of mmzg cotet retreval cost oly, et al. [1] ad Krsha et al. [11] showed that replca placemet geeral etwor topologes s NP-complete ad provded optmal solutos for tree topologes. Qu et al. [12] evaluated a umber of heurstcs ad foud a greedy algorthm offerg the best performace. I addto to retreval cost, Xu et al. [13] ad Ja et al.[14] further added update cost, whereas Cdo et al. [15] added storage cost to cosderato. Furthermore, Kalpas et al. [16] comprehesvely cosdered all three costs (retreval, update ad storage) ad offered solutos for a tree topology oly. However, oe of the wor studed the case whch provsog cost betwee. MetaCDN [7] system s a commercally avalable cloud based CDN that provdes a terface for stadard cloud provders to be used for cotet delvery. The system, va a approprate web portal, grats ed users wth a umber of dfferet optos related to cost ad QoS. Specfcally, t eables a set of replca deploymet optos that cosequetly defe the request redrecto polces [17]. However, the detals of the replca placemet strateges are ot provded. Che et al. [18] are the frst oes to vestgate the problem of placg server replcas storage cloud CDNs alog wth buldg cotet dstrbuto paths amog them. Ther goal s to mmze the cost curred o CDN provders whle satsfyg QoS requremets for ed users. I the proposed wor, we go oe step further by cosderg the optmzed replca placemet problem ad dstrbuto path costructo for etwored cloud evromet. Two ds of offle replca placemet algorthms for cloud CDN ether have or ot the formato of users request. 3. COUD CDN AND ITS ERFORMANCE 3.1 Cloud CDN The cost of cloud CDN results from the badwdth ad storage, whch s chagg wth users load. Itellget replca placemet ad users redrecto strateges are requred. The problem of replca placemet s vestgated wdely the tradtoal CDN. However, the exstg results o 611

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 replca placemet caot be used o cloud CDN settg for the flowg reasos: Much research o tradtoal CDNs assumes that the etwor topology s gve, for example, the tree structure whch root s the org server. However, the cloud evromet, the CDN bulders have the freedom to buld ay topology. Thus, the problem of replca placemet cloud CDN s a ot problem of buldg dstrbuto paths ad replcato. The cost of replcas copy from ste u to v s d ( u; v ). For tradtoal CDN, the edges are usually udrected,.e. d ( u; v ) = d ( v; u ). However, the prces for uploadg ad dowload storage cloud are dfferet, whch demostrates that the edge s drected. Ths mples that oly choosg a set of replca odes s ot eough; replcato drectos should be preseted. Fgure 1. The Problem of Replca Placemet Cloud CDN Replca placemet problem cloud CDN s gve fgure 1.The potetal replcas odes are C1 -- C 4. Some potetal paths of dstrbuto from C are demostrated by bold le. It s assumed that each odes have the path to every users U.However, oly a subset of these paths ca satsfy the requremets of QoS for users request whch are draw as dashed le fgure 1(A). A d of soluto of replca placemet s show Fgure 1(B), where C 3 s chose to provde servce for to the request from U 1 ad C 4 s chose to provde servce for the request from U 2 ad U 3. C 1 s chose to provde servce from U 4 as well as forward the replca from C to C3 ad C [18] 4. I Ref. [18], a off-le replca placemet algorthm s proposed. Va trace-based study, t s show that cloud CDN sgfcatly reduces the cost of provdg CDN servce to small Web stes to as low as 2.62 US Dollar compared to the 99 US Dollar mmum charge by a tradtoal CDN. However the GS algorthm does ot cosder the load balace problem, whch s assumed that the cloud CDN maagers have both the formato ad the requests from the users. I ths paper, two classes of off-le replca placemet algorthms for cloud CDN were proposed, whch provde servce for ole vdeo servce.oe of the algorthms amed GUCP could be used to place replcas whe cloud CDN. It could solve the load mbalace problem caused by GS. The other amed PBP could be used to place replcas whle there s o formato of users requests cloud CDN. 3.2 The Cost of Cloud CDN Multple storage cloud odes copy replcas from the org server C, ad the respod to all the edge users requests. The cloud CDN uses multple storage odes or data ceters, however, each storage ode belogs to a certa provder. The cloud odes from dfferet provders may be colocal, but they may offer dfferet servce prces as well. It s assumed that there are m edge users U = { U1, U 2,, U m } dex by, ( = 1,2, m),ad cloud storage odes C = { C1, C2,, C } dex by or, (, = 1,2, ). I ths paper, t s assumed that the sze of replca s T.The cloud storage odes wll chage wth the data storage, put ad output traffc. Node C wll charge ut storage cost of S per GB for storg the replca, P per GB for replca uploadg traffc ad D per GB for replca dowloadg traffc. et V uv deote the cost of replca coped from ode u to ode v. V uv has two dfferet meags whch are depeded o the ature of ode v. Case I: ode v s the ode of storage cloud ode. V uv s the cost of opeg ode, that s to say, the cost of ode v = C dowloadg replca from ay ode whch has t. The org servce wll update the replca real tme. I ths paper, t s assumed that F T s the cotet eeds to be updated per ut tme, where F represets the frequecy of cotet update, Vuv = V = ( S + P F + D F) T. The ode C whch has the orgal replca s assumed to be the org server. The path of replca dstrbuto wll be a tree structure, the root ode of whch s C, as s show fgure 1(B). 612

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 Case II: v V uv dcates a user, v = U, V uv s the access cost of user who s assged to ode u = C.The sze of cotet whch edge user U request s T ad Vuv = V = T D. T may be smaller tha T or larger tha or equal T, because of users dd ot request s ot the whole replca or usg web cache, or repeated request the wthout cachg. I summary, the cost of cloud CDN s the sum of the cost of replcas placed o the storage cloud ode(.e. ope ode) ad the cost of servce for the requests from users, Cost = OpeCost + UserCost I ths paper, the cost of uploadg, dowloadg ad storage of each ode s defed radomly. The cost s measured by abstract moetary values (ot a specfc currecy such as the U.S. dollar). The value of cost s used to compare the performace of dfferet algorthms, but s ot the real moey ths paper. 3.3 The Defto of QoS Cloud CDN There s sgfcat correlato betwee etwor delay ad route dstace. Because the actual hop cout formato betwee users ad cloud stes s dffcult to acheve, the geographc dstace s cosdered as a dcator of delay. I ths paper, the route dstace s defed as the geographc dstace betwee user ad storage cloud ode. Moreover, the Eucld dstace s chose as the route dstace our smulato expermets. I fact, the choce of dstace metrc does ot mpact the performace of our algorthms; ay dstace metrc that s capable of descrbg the QoS requremet s applcable. et R deotes the route dstace betwee user U ad ode C. R represets the commucato qualty betwee two odes. The smaller R (the dstace from ode to user) s, the better the qualty of commucato wll be obtaed. I ths paper, the storage cloud odes are utlzed as replca servers. The abltes of badwdth, relablty ad cocurrecy of replca server odes are cosstet. The QoS dstace could be represeted by route dstace. To satsfy the requremets of QoS from the users, t should be esured that the route dstace s less tha QoS threshold Q,.e. R Q. The GS algorthm assgs the users to the earest ode. The umber of users s very large, ad the total route dstace wll be creasg wth the users. Hece, the average route dstace s used to measure the QoS performace. The average route dstace R s obtaed by (1). Where m = ( ode user ) + ( ode user ) 2 2 (1) = 1 = 1 R X X Y Y m X ad Y respectvely deote ode ode horzotal ad vertcal coordates of the ode, whch s assged to users, m s the total umber of users. 3.4 The oad Balace of Cloud CDN The load balace s a mportat performace dstrbuted system. To acheve the mmum of the rug tme, the dstrbuted systems assg obs accordg to the servers performace. I the process of ole vdeo servce, the tme of the coecto betwee users ad servce odes s very log. I order to serve more users wth lmted odes, the mprovemet of the load balace performace of multple vdeo server odes s a very mportat problem. I ths paper, t s assumed that the resource of load s the umber of users served by cloud storage odes, whle the other resources (storage, badwdth or CPU ad so o) s ot tae to accout. The umber of users whch ode provdes servce to s used to scale the performace of load. I ths paper, we used the load weght [19] to descrbe the performace of load cloud CDN. It s assumed that there are cloud storage odes cloud CDN. The ablty of load of cloud storage ode C, = 1,, s ω, = 1,,. Based o the prcple that the ode load should match wth ts load ablty, We assume that the ablty of cloud storage odes s the same,.e. ω = ω = = ω = ω (2) 1 2 It s assumed that the load of C s, load weght s defed as the rato of curret load ad ablty of load. W = = (3) ω ω Whe the load balace s acheved, W = W, ;, = 1,2,, (4) The average load weght s defed as formula (5). W 1 1 = W = (5) = 1 ω = 1 oad weght could be represeted by formula (6) whe the cluster s balacg. W 1 2 1 1 = W = = W = W = = 1 ω (6) = 1 For both ad ω are fxed, the load weght 613

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 could be represeted approxmately as formula (7) whle the cluster s balacg. 1 = 2 = = = W ' (7) The load of a ode s assocated wth ts uses formula (7). The load coeffcet s defed to evaluate the load performace. I ths paper, the mea square devato of load s used to descrbe the load coeffcet of cloud CDN. The load coeffcet η s defed as formula (8).The smaller η s, the load s better. = = 1 η ( W ') 2 (8) 3.5 The Other Performace of Cloud CDN Performaces metoed above ca be used to measure cloud CDN wth formato of users requests. Ufortuately, t fals whe there s o formato of users request to use. After replcas are placed o the odes radomly, requests from users wll be redrected to the earest odes. If the redrected ode has o replca whch s eeded, the ode wll dowload t from the other odes. The process of copyg replcas from the other odes wll result addtoal cost ad tme delay. et Push _ Cost deote the addtoal cost, ad t ca be represeted as: Push _ Cost = T ( S + P + D ) (9) l = 1, = 1, l = 1 I (9), deotes the ode whch dowloads ad stores the replca from the other odes, ad deotes the replca provder. Tl s the sze of the replca. et DEAY deote the tme delay ad Avg _ Push _ Delay deote the average tme delay. It s cosdered that we should dowload replcas from ode to whe DEAY s ozero, ad M s the tme of dowloadg. It s assumed that the value of DEAY s the route dstace betwee ode ad. Avg _ Push _ Delay ca be obtaed by (1). Avg _ Push _ Delay = DEAY / M (1) = 1, = 1, 4. THE VIDEO REPICA PACEMENT AGORITHM WITH REQUESTS INFORMATION I ths paper, cloud CDN vestgated s for ole vdeo. The cotet of the ole vdeo has large fle sze ad low updatg frequecy. The users must be respoded to as soo as possble, hece, the vdeo replca should be placed o the edge odes. The cloud CDN may ether have or ot the formato of users requests. O the oe had, f the cloud CDN has the formato of users requests, t could be used to push the cotet to the edge odes; O the other had, f cloud CDN has o requests formato, PBP (popular based placemet) algorthm could s used to place the replca based o the cotet popular. 4.1 GS Algorthm The GS(Greedy Ste) algorthm s proposed [18],GS s adopted from a approxmato algorthm for the Set Coverg Problem [2]. GS teratvely decdes to ope a closed ste whch has the maxmum utlty ad assg all ts potetal users to t. A potetal user of a ode has mmum route dstace ad t has ot bee assged to ay ode. The Cloud CDN opes a storage cloud ode, ad the fds the ext oe to ope, utl all of the users are assged to a certa ode. et O deote the cost to ope C. fc sopen O = (11) m { Csope} V otherwse The algorthm 1 s the pseudo-code of GS. Algorthm 1 : Greedy Ste [18] E s the set of user who have ot bee assged E s the curret set of users who ca be assged to whle E do W w, U E K arg max { C scosed} W D W + O Assg all users E to C Ope C E E E ed whle 4.2 The Descrpto of oad Imbalace Problem The assgmet of users GS s operated accordg to the route dstace, whch may lead to load mbalace. It s assumed that there are 2 odes ad 6 users the locato where the replcas should be placed. oad mbalace wll come from allocatg users to odes usg GS. The load mbalace may occur whe the GS algorthm s used. The load C 614

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 coeffcet of cluster s η = 1411.71, the maxmum ad mmum load of ode are M ax = 155 ad M = 54 respectvely. The load threshold defed Ref. [21] s Th = (1 + ) / 2, where s the average load of all the odes. Ufortuately, the defto s ot suted for the cloud CDN, whch s proved by expermets. We defe the ew dyamc load threshold of cloud CDN as (1), where α s the mbalace factor tag values (,1]. The value of α wll be gve secto VI by expermets. Th = W ' α (12) The users of odes should be adusted to esure the load balace, whe p > q ad Th ad p q; p, q = 1,2,,. To solve the p q load mbalace caused by GS, A ew algorthm amed GUDP(Greedy User Core Preallocato) s proposed. 4.3 GUCP Algorthm The GUCP(Greedy User Core Preallocato) wll adust the assgmet whe load mbalace appears, after assgg users to the earest odes. Some users of ode p should be assged to q, whle p q Th, p q; p, q = 1,2,,. It s assumed that the coordate of ode s uformly dstrbuted. K-meas algorthm s a ormal cluster algorthm. It s a clusterg method based o statstcs. K-meas clusterg s a method of vector quatzato orgally from sgal processg, that s popular for cluster aalyss data mg. K- meas clusterg (MacQuee, 1967) s a method commoly used to automatcally partto a data set to groups. It precedes by selectg tal cluster ceters ad the teratvely refg them as follows [22] : 1. Each stace U s assged to ts closest cluster ceter. 2. Each cluster ceter CORE s updated to be the mea of ts costtuet staces. Frstly, K-meas algorthm s adopted to accomplsh the clusterg of the users. The, about ( ) / 2 users should be pced from ode p p q ad added to q. K-meas algorthm could be used to cluster the users of ode p, ad move users belogg to p ad earest to q batches. But t s dffcult to get a sutable value of for dfferet data sets. I ths paper, we wat to move about ( ) / 2 users because the users are p q uform dstrbuto at the locato eed to place replcas. The umber of cores -meas ca be gve by (13): N = 2 ( ) (13) p q A core CORE s selected from N cores whch are clustered by -meas,ad t s earest to q. The, we move the users that correspod to CORE to q. The we chec whether the system s load mbalace aga. If load mbalace, we wll repeat the operato of users assgg adustmet utl load balace. The core dea of GUCP s usg -meas to cluster the users whch are assged to the over load ode, ad the adustg the users by the cores to load balace of cloud CDN. The system acheves balace qucly because of the frequecy of adustmet s reduced by ths algorthm. The algorthm 2 s the pseudo-code of GUCP: Algorthm 2 : Greedy User Core Preallocato E s the set of user who have ot bee assged E s the curret set of users who ca be assged to whle p q Th ( p q; p, q = 1,2,, ) do N cores Kmeas ( U ) Fd m dstace Assged users ed whle whle E W w, U E K p C Core _ um betwee Cq ad cores U core _ um to C q Assg all users E to C Ope C E E E ed whle arg max { C scosed} W D W + O 5. THE VIDEO REPICA PACEMENT AGORITHM WITHOUT REQUEST INFORMATION 5.1 The Popularty of Vdeo cotet I the last secto, a vdeo replca placemet algorthm for cloud CDN s proposed for cloud CDN wth formato of users requests. However, f cloud CDN has o requests formato, the preseted algorthm the last secto fals. Hece, ths secto,a ew algorthm based o popularty of vdeo replca s proposed for cloud CDN wthout requests formato. Popularty of vdeo cotets The qualty of servce of CDN s affected by the placemet strategy of the copes [23]. o the 615

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 oe had, If the replcas are placed more tha they are eeded, the storage space s wasted. O the other had, whe the copes are placed too few, the t wll result low servce qualty. Although the CDN ca adust the umber of copes tself, t wll cost a lot of tme ad much operato of I/O. The popularty of replca s a mportat bass for placemet. Methods of decso tree ad eural etwor model to predct the popularty of moves were proposed Ref. [24] ad Ref. [25]. These methods ca be used cloud CDN whe t has o requests formato. The dstrbuto of popularty of replca s assumed as Zpf dstrbuto Ref. [26] ad Madelbrot-Zpf dstrbuto Ref. [27], but the dstrbuto of vdeo popularty obeys ether of them. s. The extesve law model s sutable for the dstrbuto of popularty of move [28]. The 1722 peces of moves request formato are got from http://move.youu.com 213. It s show that the use of stretched expoetal model o come popularty data fttg ad foud more le wth the stretched expoetal model fgure 2. It proves that the dstrbuto of vdeo popularty s a stretched expoetal dstrbuto. The probablty desty fucto (PDF) of stretched expoetal s show (12). The parameters of the vdeo popularty dstrbuto fucto ca be got, whch s stretched expoetal c =.33 ad x = 27 c 1 c p( ) = c exp[ ( ) ] x c, = 1,, N (12) x 5.2 The Relatoshp betwee Number ad Popularty of Replcas It s assumed that cloud CDN has N odes, whch have bee opeed. et NUM deote the umber of replcas of cotet ad POP deote the popularty of vdeo cotet. It s ow to us that the more popular the vdeo cotet s, the more replcas ars eeded. Hece t s assumed that the most popular cotet s replcas are placed o all of the storage cloud odes, ad the, the umber the most popular cotet the replcas s N. It s assumed that the rato of POP ad MAX _ POP s equal to the rato of NUM ad N. NUM = N POP / MAX _ POP (13) 5.3 PBP Algorthm To place the replcas sutably wth requests formato, a ew algorthm amed PBP (Popularty Based Placemet) s proposed. et T deote the cotet set ad T deote the cotet of.pbp utlzes popularty of vdeo cotet to place the replcas. For every cotet T, The popularty of T s gve frst, the calculate the umber of replcas of T NUM by (13) lastly, select the odes are selected wth mmum cost from the opeed odes to place the replcas. Algorthm 3 s the pseudo-code of PBP: Algorthm 3 : Popularty Based Placemet T s the set of cotet whch have ot bee placed C s the curret set of odes whch ca be placed replca N s the umber of cloud storage odes for( = ; < T. legth() ; + + ) POPT Get_Popularty( T ) NUM N POP / MAX _ POP C ' C for( = ; ed for ed for < NUM T ; + + ) arg m( T D ) C ' Place T o C ' C ' C C Fgure 2. Fttg Of Data Ad Stretched Expoetal Model 616 5.4 Radom Algorthm The PBP algorthm gves the umber of replcas by cotets popularty. I Ref [29], radom algorthm s used to place the replcas. I ths paper, radom algorthm s chose to compare

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 wth the PBP. as (14). NUM s gve radom algorthm NUM = rad()%( N + 1) (14) 6. EXPERIENCE AND DISCUSSION I ths secto, we preset the parameters for umercal expermets secto A, ad the 6.1 Parameters of Smulato Frequcy of Adustmets 1 9 8 7 6 5 4 3 2 1 I ths secto, umercal expermets are performed to evaluate the performace of the proposed algorthms Matlab. It s assumed that there are 2 odes whose cost of uploadg, dowloadg ad storage are radom umbers tag values (,1]. The cost values are assumed to be dfferet from each other ad the users s less tha 1. The badwdth of each ode s 1 Mbps; the users are radomly dstrbuted a rg, of whch the sde dameter s 1 ad the outsde dameter s 2. Whe every user watches vdeo, a average of 1Kb s occuped.2 odes ca serve 2 users at most. Hece, the smulato expermet va 1 users wll ot cause overload. The umber 2 reflects the curret status of the umber of cloud storage provders s ot much. The umber 2 reflects the fact that there are ot so may cloud storage provders. However, eve f more tha 2 odes are chose, t has lttle fluece o the fal results. The value of mbalace threshold has fluece ot oly o the value of load coeffcet η, but also o the frequecy of adustmet whe load mbalace occurs. Reasoable threshold value ca esure less frequecy of adustmet whle the load coeffcet η s smaller. To get the value of dyamc mbalace factor α,we assume that there are 6 users usg the cloud CDN. The evoluto of the load coeffcet ad the adustmet frequecy are show fgure 3 ad 3 respectvely, as α vares from 5% to 99%. Fgure 3. Value of oad Stuato 1 2 3 4 5 6 7 8 9 1 Value of α(%) 35 3 25 2 15 1 5 Fgure 4. The Relatoshp of oad of α ad Number of Adustmets 1 2 3 4 5 6 7 8 9 1 Value of α(%) The Relatoshp of α ad oad Coeffcet From the fgures 3 ad 4, we ca see that at the pot 25%, the adustmet frequecy has decreased to some acceptable extet, besdes, the load coeffcet s ot very large. Hece, α s chose as 25%,cosequetly, the load threshold Th = W '.25. Accordg to the dstrbuto of odes ad users, It s assumed that the average dstace betwee users ad odes s 1, the threshold of QoS ca be chose as 1, the upper lmt of route dstace s 1. The parameters of umercal expermets are lsted table I. Table1. Parameters of Numercal Smulato Expermet Parameter Type Nodes User s Cotets umber Cotet s sze Q α Value 2 1 ~1 1 ~1 1 25% 617

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 6.2 Result of Numercal Expermet for Algorthm wth Requests Iformato Three sets of expermets are carred out, wth the users beg a varable factor. GUCP s compared wth GS load, cost ad average route dstace, respectvely, as the users s varyg from 1 to 1. Value of oad Stuato 45 4 35 3 25 2 15 1 5 GUCP GS 1 2 3 4 5 6 7 8 9 1 Number of Users Fgure 5. 55 The Comparso of oad Coeffcet The comparso of load coeffcet betwee GS ad GUCP s show fgure 5, whch mples that the latter oe has much advatage over the former load balacg abltes. I fgure 6, we see that more cost should be pa the algorthm GUCP compared wth GS, whch s also acceptable for us. I fgure 7, we see that the average route dstace s lttle lager GUCP tha GS, but t s obvously stll acceptable. 6.3 Result of Numercal Expermet for Algorthm wthout Requests Iformato I the case that there s o users request formato to utlze, the PBP algorthm s to be chose. Two sets of expermets are carred out, wth the users beg a varable factor.pbp s compared wth Radom push cost ad average delay. I fgure 8 ad 9, we see that less average delay ad push cost s smaller PBP tha Radom. Hece, from the comparso, t s coclude that the PBP s more effectve. Cost 5 45 4 35 3 25 2 Average Delay 1.6 1.4 1.2 1.8.6 RANDOM PBP 15 GUCP GS 1 1 2 3 4 5 6 7 8 9 1 Number of Users Fgure 6. The Comparso of Cost.4.2 Fgure 8. 1 2 3 4 5 6 7 8 9 1 Number of Users The Comparso of Average Delay 1 9.8 9.6 GUCP GS 14 12 RANDOM PBP Average Route Dstace 9.4 9.2 9 8.8 8.6 Push Cost 1 8 6 8.4 4 8.2 2 Fgure 7. 8 1 2 3 4 5 6 7 8 9 1 Number of Users The Comparso of Average Route Dstace 1 2 3 4 5 6 7 8 9 1 Number of Users Fgure 9. 7. CONCUSIONS The Comparso of Push Cost 618

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 I ths paper, two offle replca placemet algorthms are proposed for cloud-based storage CDNs. The GUCP effectvely solved the load mbalace problems replca placemet compared wth the exsted GS algorthm. Whe there s o formato of users requests, a ew algorthm called PBP s proposed based o the popularty of vdeo cotet. It s show that the PBP has much advatage average delay ad push cost compare wth the Radom algorthm. Numercal expermets have demostrated the effectveess of the method above. ACKNOWEDGMENT The authors are grateful to the aoymous referees for ther valuable commets ad suggestos to mprove the presetato of ths paper. Ths wor was supported part by Natoal Key Techologes R&D Program of Cha (Grat No. 212BAH73F2) ad the Strategc Prorty Research Program of the Chese Academy of Sceces(Grat No. XDA639). REFERENCES [1] Raybur D. CDN prcg: Costs for outsourced vdeo delvery[j]. Streamg Meda West, 28. [2] Broberg J. Buldg Cotet Delvery Networs Usg Clouds[J]. Cloud Computg: Prcples ad Paradgms, 511-531. [3] M. Armbrust, A. Fox, R. Grffth, A. Joseph, R. Katz, A. Kows,G. ee, D. Patterso, A. Rab, I. Stoca et al., Above the clouds:a bereley vew of cloud computg, EECS Departmet, Uversty of Calfora, Bereley, Tech. Rep. UCB/EECS-29-28, 29. [4] Elso J, Howell J. Hadlg flash crowds from your garage[c]//usenix ATC. 28, 4. [5] D.Gottfrd.Self-servce,prorated superc omputg fu! OPEN: All the code tha s ft to prtf().http://ope.mytme.com/27/11/1/se lf-servce-prorated-super-com-puttgfu/,1/11/27 [6] MacAsll D. Scalablty: Set Amazo s servers o fre, ot yours[c]//etech 27: O Relly Emergg Techology Coferece. 27. [7] J. Broberg, R. Buyya, ad Z. Tar, MetaCDN: Haressg Storage Clouds for hgh performace cotet delvery, Joural of Networ ad Computer Applcatos, vol. 32, o. 5, pp. 112 122, 29. [8] J. Broberg, S. Veugopal, ad R. Buyya, Maret-oreted grds ad utlty computg: The state-of-the-art ad future drectos, Joural of Grd Computg, vol. 6, o. 3, pp. 255 276, 28. [9] Neves T A, Drummod, Och S, et al. Solvg replca placemet ad request dstrbuto cotet dstrbuto etwors[j]. Electroc Notes Dscrete Mathematcs, 21, 36: 89-96. [1] B, Gol M J, Italao G F, et al. O the optmal placemet of web proxes the teret[c]//infocom'99. Eghteeth Aual Jot Coferece of the IEEE Computer ad Commucatos Socetes. Proceedgs. IEEE. IEEE, 1999, 3: 1282-129. [11] Krsha P, Raz D, Shavtt Y. The cache locato problem[j]. IEEE/ACM Trasactos o Networg (TON), 2, 8(5): 568-582. [12] Qu, Padmaabha V N, Voeler G M. O the placemet of web server replcas[c]//infocom 21. Tweteth Aual Jot Coferece of the IEEE Computer ad Commucatos Socetes. Proceedgs. IEEE. IEEE, 21, 3: 1587-1596. [13] Ja X, D, Hu X, et al. Placemet of webserver proxes wth cosderato of read ad update operatos o the teret[j]. The Computer Joural, 23, 46(4): 378-39. [14] Xu J, B, ee D. Placemet problems for trasparet data replcato proxy servces[j]. Selected Areas Commucatos, IEEE Joural o, 22, 2(7): 1383-1398. [15] Cdo I, Kutte S, Soffer R. Optmal allocato of electroc cotet[j]. Computer Networs, 22, 4(2): 25-218. [16] Kalpas K, Dasgupta K, Wolfso O. Optmal placemet of replcas trees wth read, wrte, ad storage costs[j]. Parallel ad Dstrbuted Systems, IEEE Trasactos o, 21, 12(6): 628-637. [17] Papaga C, evadeas A, Papavasslou S. A Cloud-oreted Cotet Delvery Networ Paradgm: Modelg ad Assessmet[J]. 213. [18] Che F, Guo K, J, et al. Itra-cloud lghtg: Buldg CDNs the cloud[c]//infocom, 212 Proceedgs IEEE. IEEE, 212: 433-441. [19] Guo Cheg-cheg Ya Pu-lu.A Dyamc oad-bal ac g Al gort hmf or Heterogeeous Web Server Cluster[J].Chese Joural of Computers, 25,28(2):179-184 619

Joural of Theoretcal ad Appled Iformato Techology 31 st Jauary 214. Vol. 59 No.3 25-214 JATIT & S. All rghts reserved. ISSN: 1992-8645 www.att.org E-ISSN: 1817-3195 [2] Chvatal V. A greedy heurstc for the setcoverg problem[j]. Mathematcs of operatos research, 1979, 4(3): 233-235. [21] Godfrey B, ashmarayaa K, Suraa S, et al. oad balacg dyamc structured P2P systems[c]//infocom 24. Twety-thrd AualJot Coferece of the IEEE Computer ad Commucatos Socetes. IEEE, 24, 4: 2253-2262. [22] Wagstaff K, Carde C, Rogers S, et al. Costraed -meas clusterg wth bacgroud owledge[c]//icm. 21, 1: 577-584. [23] V.S.RAMASUBRAMANIAN, Cost-Aware Resource Maagemet for Decetralzed tralzed Iteret Servces, PhD thess, Corell Uversty, Jauary 27. [24] Sha Y, uzhag Z, Mg Z. The popularty of moves predct system based o data mg techology for CDN[C]//Computer Scece ad Iformato Techology (ICCSIT), 21 3rd IEEE Iteratoal Coferece o. IEEE, 21, 7: 64-67. [25] J, Hog S, Xa S, et al. Neural Networ Based Popularty Predcto For IPTV System[J]. Joural of Networs, 212, 7(12): 251-256. [26] Zhou X, Xu C Z. Optmal vdeo replcato ad placemet o a cluster of vdeo-odemad servers[c]//parallel Processg, 22. Proceedgs. Iteratoal Coferece o. IEEE, 22: 547-555. [27] Hefeeda M, Noorzadeh B. O the beefts of cooperatve proxy cachg for peer-to-peer traffc[j]. Parallel ad Dstrbuted Systems, IEEE Trasactos o, 21, 21(7): 998-11. [28] Zhuo You, VOD system access behavor research: measuremet, aalyss ad modelg, PhD thess, Uversty of Scece ad Techology of Cha, Jauary 27. [29] Borst S, Gupta V, Wald A. Dstrbuted cachg algorthms for cotet dstrbuto etwors[c]//infocom, 21 Proceedgs IEEE. IEEE, 21: 1-9. 62