JURAL F SFTWARE, VL.,. 7, JULY 03 765 A Prort Queue Algorthm for the Replcaton Task n HBase Changlun Zhang Scence School, Bejng Unverst of Cvl Engneerng and Archtecture, Bejng, Chna Ke Laborator of Smbolc Computaton and Knowledge Engneerng of Mnstr of Educaton, Jln Unverst, Changchun, Chna Emal: zclun@bucea.edu.cn Kauan Wang and Habng Mu School of Electroncs and Informaton Engneerng, Bejng Jaotong Unverst, Bejng, Chna Emal: hbmu@bjtu.edu.cn Abstract The replcaton of the non-structure data from one data center to another s an urgent task n HBase. The paper studes the prort growth probablt of the prort replcaton queue and proposed a dnamc prort replcaton task queue algorthm based on the earlest deadlne frst algorthm (EDF). The eperment results show that the proposed algorthm can balance the replcaton overhead between the hgh and low prort tasks and avod the low prort task starvng to death as well as ensure the hgh prort task s nterests. Inde Terms non-structure data, HBsase, replcaton queue, earlest deadlne frst algorthm (EDF) I. ITRDUCTI The nteracton among users generates more and more non-structure data n Web.0 era. These non-structure data have no specfc structure and cannot be descrbed wth some a certan format, such as the mcro bloggng message (ncludng the @ and hperlnks, pctures, etc.), the ml fle and so on. Internet companes establsh large amount of ggantc datacenters around the world to store these non-structure data. The number of hosts n a sngle data center can be several hundred to tens of thousands. Google has more than 50 data centers and 0 mllon servers [] to store ts customers' dal producton of massve amounts of non-structured data around the world. It s a bg challenge to manage and use these data, ncludng data readng, data storng, data ndeng, data addressng, nterface of data confguraton and management, partcularl the data replcaton among multple data centers s more urgent. BgTable [,3] s a dstrbuted storage sstem developed at Google for managng structured data and has the capablt to scale to a ver large sze: petabtes of data across thousands of commodt servers. BgTable has the ablt to store structured data wthout frst defnng a schema provdes developers wth greater fleblt when buldng applcatons, and elmnates the need to re-factor an entre database as those applcatons evolve. However, BgTable cannot manage structured data. HBase [4-6] s the Hadoop database, whch s an Apache open source project whose goal s to provde Bg Table lke storage. HBase s for storng huge amounts of structured or sem-structured data. Data s logcall organzed nto tables, rows and columns. Columns ma have multple versons for the same row ke. The data model s smlar to that of Bg Table. Smlar to the tradtonal data transmsson or routng servces, dfferent data replcaton tasks among dstrbuted data centers are of dfferent QoS requrements because column famles ma belong to dfferent busness, users, columns wth dfferent prortes and requrements for dela, bandwdth, and avalablt are varous accordng to dfferent busness and data. Therefore, t s necessar to provde a replcaton task management mechansm based on task prort for HBase data replcaton protocols. It can mplement dnamc prort management for cache queue of replcaton task. Luo [0] presents a probablt-prort herarchcal schedulng algorthm. Compared to the prort queue schedulng algorthm [,], the algorthm can on the one hand promse the tme dela and data loss performance of hgh prort data packet groups and on the other hand mprove the packet loss performance of low prort data packet groups. In ths paper, we propose a dnamc prort schedulng scheme based on the sequence of prort growth probablt referrng to the prort sequence of the earlest deadlne frst algorthm (EDF) [7-9]. The prort s dvded nto three levels: low, mddle and hgh wth correctng. The dnamc prort schedulng method can balance the replcaton overhead between the hgh and low prort tasks and avod the low prort task starvng to death as well as ensure the hgh prort task s nterests The rest of paper s organzed as follows. Secton ntroduce the EDF algorthm whch s the bass of our algorthm n the net secton. The man work of the dnamc prort schedulng scheme s descrbed n the Secton 3. Secton 4 gves the eperment an analss of the proposed algorthm. Secton 5 concludes the paper. do:0.4304/jsw..7.765-769
766 JURAL F SFTWARE, VL.,. 7, JULY 03 II. A VERVIEW F EDF The earlest deadlne frst schedulng algorthm(edf) [7-9] s wdel used as a prort schedulng algorthm. It calculates prort n accordance wth the task deadlne and assgned a hgher prort to the task close to the deadlne. The task wth hghest prort s promsed to run at ever moment. EDF acheves a dnamc prort schedulng algorthm for the deadlne of the task n buffer queue ma change as tme goes b. EDF algorthm can alwas obtan a feasble schedule as long as there s one. In other words, f EDF algorthm cannot generate a feasble schedule, there wll be no other feasble schedule. In ever new read state, EDF selects the task wth the earlest deadlne from the task that s read but not et full processed, and allocates the requred resources to the task. The scheduler mmedatel recalculates the deadlne of the task and gves new prort order as new tasks add. It deprves runnng tasks of control rght on processors and decdes whether to schedule a new task or not accordng to the new task s deadlne. The new task ma be processed mmedatel f ts deadlne s earler than the current task. In accordance wth the EDF algorthm, the processng of the nterrupted task wll resume later. EDF algorthm has a smple necessar and suffcent condton to determne the schedulablt: as long as the load of the perodc task set U s not greater than. EDF algorthm can generate feasble schedulng and has such characterstcs as: ) Task model: the same as RMS (Rate-Monotonc Schedulng); ) Prort assgnment method: Prort s dnamcall allocated as the nearer to the deadlne the hgher t s; n C 3) Schedulablt: If the task set meets T, the task s schedulable. EDF schedulng algorthm has been shown to be the optmal dnamc schedulng wth necessar and suffcent condton. It s of up to 00%CPU utlzaton wth more onlne schedulng overhead than the RMS. EDF schedulng algorthm s establshed based on the followng assumptons: ) The emptve cost s ver small; ) nl processng requrements are sgnfcant, the I /, memor and other resource requrements can be gnored; 3) All tasks are ndependent, there s no prort relatonshp constrant among them. These assumptons smplf the analss of the EDF. Assumpton shows that the msson to seze at an tme, ths process s not preempted for an loss, can be restored at a later tme, one task was to seze the number does not change the overall workload of the processor. Assumpton shows that there are no other factors that lead to comple problems ecept suffcent processng capact to ensure performng tasks wthn the tme lmt to check the feasblt. Assumpton 3 specfes that there does not est a prort constrant relatonshp whch means that the release tme of the task s ndependent upon the end tme of other tasks. As to sstem whch s not met the above three assumptons, we need to take prort and ecluson constrants to solve the problem. The EDF algorthm s the optmal dnamc schedulng algorthm wth sngle-processor. The upper lmt ts schedulablt s 00%, that s to sa, f the EDF algorthm cannot schedule a task set reasonable on a sngle processor, then the other schedulng algorthm also cannot accomplsh ths task. III. PRBABILITY PRIRITY GRWTH ALGRITHM BASED EDF A. Prort of Replcaton Task n HBase In dstrbuted database HBase, column faml data beng to replcate and snchronze ma be of dfferent mportance because of requrements and busness tpes. It s necessar to set dfferent prort to dfferent column faml or ts column accordng to the dfferent QoS requrements [3-5] n order to dstngush data of dfferent prort durng data replcaton among data centers. An mproved prort queue for HBase replcaton can be constructed accordng to the theor of EDF. The prort of column faml data ma be broken down nto ts column. It ma also be dfferent because t comes from dfferent user or tme. The replcated data to be stored b dfferent prort task ma belong to one column faml, so t ma not onl reduce the queue length but also make the send, store, and read more batches, contnuous and rapd b mergng the tasks storng the same column faml. In some mplementaton of the prort queue, the prort of each task s statc confguraton and won t change over tme. When more hgh-prort tasks queued, low-prort task s lkel to be "starved to death for t s alwas unable to get servce. Accordng to the dea of EDF, the prort of task should be dnamcall adjust and ncrease over tme. The mplementaton of the dnamc prort acts as follows: ) Each replcaton task jonng the queue s set to an ntal prort; ) The prort of ever task ncreases over ever perod; 3) Each tme, the task of the hghest prort n the queue ma be selected to be replcated. Assumng that there are a total of prortes from to, s the lowest prort and represents the hghest prort, the bgger the number the hgher the prort. Accordng to the above conclusons, the prort of replcaton task n HBase column faml should nclude the task s mergng prort and the growng prort over tme. B. Prort of the Merger Task The mergng of multple replcaton tasks s based on ther same nde of the storage locaton, that s, the belong to the same column faml, but wth dfferent prortes. bvousl, the prort of the merged task s at least equal to the hghest prort of orgnal task to
JURAL F SFTWARE, VL.,. 7, JULY 03 767 ensure that the orgnal mportant task remans a hgh prort. If the prort of task s t and the mamum number of the task to merge s MAX, then, the prort T of the task,,...,... n after beng merged s: T=ma{t, t, t, t n }(n MAX) () C. Smple Prort Queue Algorthm Based on EDF Assume that a total of prort from to, where s the lowest prort and represents the hghest prort. Prort of the task n the queue s ncreased b ever perod n the smple prort queue based on EDF(SPQA). Each tme the task of the hghest prort s selected to eecuton. Ths dnamc prort avods the task wth low prort dng of starvaton, but t s unfar for hghprort task that the prort of low-prort task s growng too fast. For eample, n the case of eght prortes from to, the task of the ntal prort settng as 5 ma grow up to 7 after two perods n the buffer queue. Assumng that there comes a new task wth prort of 7, t wll not be replcated because the task wth ntal prort of 5 catches the chance. It seems unfar to the hgh prort task for the prort of the lower prort growng too fast task. The senstvt to the tme of the low prort s lowered to solve ths problem whch can reduce the growng rate of the low prort tasks as shown n Fgure. We can construct a prort growth probablt sequence {P,} ( =,,..., ) based on the total number of prort, whch makes the low-prort tasks has a lower prort growth probablt and hgh-prort task has a hgh prort growth probablt. growth probablt / prort Fgure. Prort growth probablt D. Probablt Prort Growth Algorthm Based on EDF (PPGA) In order to make the prort growth probablt of the low-prort task lower and the hgh-prort task wth hgh prort growth probablt, we ntends to construct a curve whose shape s smlar to fgure to determne the prort growth probablt. In the fgure, the slope of the frst half of the curve s greater than whle the latter half s less than. The curve s establshed wth the followng condton: growng rate s / / 0 0 d () / prort Fgure. Reduce rate of of prort growth probablt Assumng that the lnear functon s () = a + b (0, s an nteger), we can obtan: So, a b (3) Y( ) (0, s an nteger) (4) Let P =, solve the prort growth probablt of prort as follows: P = P - =P -(-) (5) P =P +-() Add up the left and rght part of the above equatons respectvel to get: P ( )( ) ( ) ( k) (0 ) k For eample, we construct a prort growth probablt wth eght prort and calculate the reduce rate of each prort growth probablt b the equaton (4) and get the prort growth probablt of each prort b the equaton (6). The results are shown n table and fgure 3. Prort growth probablt s not ntutve enough to obtan average perod between the two prortes. It can be calculated as the epectatons E (n): E( n) P ( P) P 3( P) P P k k P( k( P) k k( P) ) k ' P... (6) (7)
76 JURAL F SFTWARE, VL.,. 7, JULY 03 TABLE I. PRIRITY GRWTH PRBABILITY prort Reduce rate Growth probablt Average perod 7/3 47/3 3/47 6/3 /3 3/ 3 5/3 7/3 3/7 4 4/3 /3 3/ 5 3/3 6/3 3/6 6 /3 9/3 3/9 7 /3 3/3 3/3 0 ncreased to lastl. PPGA balances the replcaton overhead between the hgh prort and low prort tasks. /prort 7 6 5 4 3 As shown n fgure 3, the probablt sequence constructed for the prort growth s ncremental nonlnear, prort growth probablt of low-prort task s small whle the hgh-prort task s s large to ensure ts nterests. It can be seen from the slope of eght broken lne that low prort tasks has a small prort growth probablt, but t has a hgh growng rate of growth probablt, the slope slowl decreases wth the prort growth. Low prort task can get a larger ncrease of the growth probablt to ensure t wll not be starved to death. /prort 7 3 4 5 6 7 Fgure 4. Prort Increase n SPQA /perod 6 growth probablt 0.75 0.5 SPQA PPGA 5 4 3 3 4 5 6 7 /perod 0.5 Fgure 5. Prort Increase n PPGA 3 4 5 6 7 prort Fgure 3. Prort Growth Probablt of SPQA and PPGA E. Analss of Epermental Results The effects of prort growth probablt are compared n the followng eperments. Here, the task whch prort s, 4 and 7 are chosen from the heap of the mamum prort queue accordng to the prort growth probablt. As shown n fgure 4, prort of the task n the queue s ncreased b ever perod n SPQA and the prort of low-prort task s growng too fast. After eght perods, the task wth hgh prort 7 ncreased to b one perod, the task wth prort 4 ncreased to b fve perods, the task wth lower prort ncreased to b eght perods. However, the fast growng of prort s changed n PPGA. In fgure 5, the task wth hgh prort 7 stll ncreased to b one perod; but the task wth lower prort has no change untl the eghth perod, and onl IV CCLUSIS We studed the prort queue theor of the earlest deadlne schedulng algorthm and the storng characterstcs of column faml n HBase n whch the column faml s the unt of replcaton tasks. A prort queue algorthm based on the prort growth probablt s establshed, ts prort growth probablt sequence s evenl dstrbuted n the nterval [0, ]. The probablt growng rate of low prort s large and the hgh prort s s small. The algorthm balances the replcaton overhead between the hgh prort and low prort tasks and avods the low prort task starvng to death as well as ensures the hgh prort task s nterests. ACKWLEDGMET Ths work s supported b the work of the Bejng Muncpal rganzaton Department of talents tranngfunded project (00D0050700000),Bejng Insttute of Archtectural Engneerng School research fund (Z0053) and Jln Unverst Ke Laborator of
JURAL F SFTWARE, VL.,. 7, JULY 03 769 Smbolc Computaton and Knowledge Engneerng of Mnstr of Educaton research fund (93K-7-0-0). REFERECES [] http://www.cnbeta.com/artcles/7330.htm. [] Chang, F. and Dean, J. and Ghemawat, S, et.al., Bgtable: A dstrbuted storage sstem for structured data, ACM Transactons on Computer Sstems (TCS), 00, 6(): 4. [3] Ankur Khetrapal, Vna Ganesh, HBase and Hpertable for large scale dstrbuted storage sstems: A Performance evaluaton for pen Source BgTable Implementatons, from Internet. [4] Dhruba Borthakur,The Hadoop Dstrbuted Fle Sstem: Archtecture and Desgn, Avalable at http://wk.apache.org/hadoop. [5] HadoopDB Project, Avalable at http://db.cs.ale.edu/hadoopdb/hadoopdb.html. [6] Azza Abouzed, et.al., HadoopDB: An Archtectural Hbrd of MapReduce and DBMS Technologes for Analtcal Workloads, proceedngs of VLDB 09, 009, Lon, France,pp9-933. [7] Zh Quan, Jong-Moon Chung, A Statstcal Framework for EDF Schedulng, IEEE CMMUICATIS LETTERS, VL. 7,. 0, CTBER 003, pp. 493 495. [] Vctor Frou, Jm Kurose,Don Towsle, Effcent Admsson Control of Pecewse Lnear Traffc Envelopes at EDF Schedulers, IEEE/ACM TRASACTIS ETWRKIG, VL. 6,. 5, CTBER 99, pp. 55 570. [9] Janjun L, et.al., Workload Effcent Deadlne and Perod Assgnment for Mantanng Temporal Consstenc under EDF, IEEE TRASACTIS CMPUTERS, pp.- 4. [0] Luo hume, Gao qang, Song shuang, The stud of herarchcal packer schedulng algorthm on probabltprort, Computer Applcatons and Software, 0, (7), pp.57-59. [] Jang Y, Tham C K,Ko C C, A probablstc prort schedulng dscplne for mult-servce networks[c]. Proc of IEEE ISCC 0. Tunsa:[S n],00, pp. 0450. [] Tham C K,Yao Q,Ko C C. Achevng dfferentated servces through mult-class probablstc prort schedulng[j]. Computer etworks. 00, 40(4):577-593. [3] Guangjun Guo, Fe Yu, Zhgang Chen, Dong Xe. A Method for Semantc Web Servce Selecton Based on QoS ntolog. Journal of Computers, Vol 6, o (0), 377-36, Feb 0. [4] Elarb Badd, Larb Esmah.A Scalable Framework for Polc-based QoS Management n SA Envronments. Journal of Software, Vol 6, o 4 (0), 544-553, Apr 0. [5] Bn L, Yan Xu, Jun Wu, Junwu Zhu. A Petr-net and QoS Based Model for Automatc Web Servce Composton. Journal of Software, Vol 7, o (0), 49-55, Jan 0. Zhang Changlun, was born n Jnng Shangdong Provnce of Chna n 97, earned Ph.D degree n Bejng Jaotong Unverst of Chna n 009. ow, He s a lecturer n Scence School, Bejng Unverst of Cvl Engneerng and Archtecture. Hs research area focuses on networks nformaton securt, network publc opnon and software engneer.