A Priority Queue Algorithm for the Replication Task in HBase



Similar documents
Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Programming Model for the Cloud Platform

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Checkng and Testng in Nokia RMS Process

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Project Networks With Mixed-Time Constraints

Real-Time Process Scheduling

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

IMPACT ANALYSIS OF A CELLULAR PHONE

M3S MULTIMEDIA MOBILITY MANAGEMENT AND LOAD BALANCING IN WIRELESS BROADCAST NETWORKS

Calculating the high frequency transmission line parameters of power cables

The OC Curve of Attribute Acceptance Plans

A DATA MINING APPLICATION IN A STUDENT DATABASE

An MILP model for planning of batch plants operating in a campaign-mode

Forecasting the Direction and Strength of Stock Market Movement

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

A heuristic task deployment approach for load balancing

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Improved SVM in Cloud Computing Information Mining

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

RequIn, a tool for fast web traffic inference

Credit Limit Optimization (CLO) for Credit Cards

Simple Interest Loans (Section 5.1) :

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A generalized hierarchical fair service curve algorithm for high network utilization and link-sharing

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Alternative Way to Measure Private Equity Performance

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

Enabling P2P One-view Multi-party Video Conferencing

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Efficient Project Portfolio as a tool for Enterprise Risk Management

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Implementation of Deutsch's Algorithm Using Mathcad

Complex Service Provisioning in Collaborative Cloud Markets

P2P/ Grid-based Overlay Architecture to Support VoIP Services in Large Scale IP Networks

Traffic State Estimation in the Traffic Management Center of Berlin

An Adaptive Cross-layer Bandwidth Scheduling Strategy for the Speed-Sensitive Strategy in Hierarchical Cellular Networks

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Mathematical Framework for A Novel Database Replication Algorithm

ivoip: an Intelligent Bandwidth Management Scheme for VoIP in WLANs

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

A Generalized Temporal and Spatial Role-Based Access Control Model

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Efficient On-Demand Data Service Delivery to High-Speed Trains in Cellular/Infostation Integrated Networks

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Calculation of Sampling Weights

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Network Services Definition and Deployment in a Differentiated Services Architecture

Performance Analysis and Comparison of QoS Provisioning Mechanisms for CBR Traffic in Noisy IEEE e WLANs Environments

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Peer-to-Peer Networks Protocols, Cooperation and Competition

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

DEFINING %COMPLETE IN MICROSOFT PROJECT

A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Section 5.4 Annuities, Present Value, and Amortization

A Passive Network Measurement-based Traffic Control Algorithm in Gateway of. P2P Systems

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment

Dynamic Scheduling of Emergency Department Resources

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Cloud-based Social Application Deployment using Local Processing and Global Distribution

QoS in the Linux Operating System. Technical Report

Canon NTSC Help Desk Documentation

Computer Networks 55 (2011) Contents lists available at ScienceDirect. Computer Networks. journal homepage:

Transcription:

JURAL F SFTWARE, VL.,. 7, JULY 03 765 A Prort Queue Algorthm for the Replcaton Task n HBase Changlun Zhang Scence School, Bejng Unverst of Cvl Engneerng and Archtecture, Bejng, Chna Ke Laborator of Smbolc Computaton and Knowledge Engneerng of Mnstr of Educaton, Jln Unverst, Changchun, Chna Emal: zclun@bucea.edu.cn Kauan Wang and Habng Mu School of Electroncs and Informaton Engneerng, Bejng Jaotong Unverst, Bejng, Chna Emal: hbmu@bjtu.edu.cn Abstract The replcaton of the non-structure data from one data center to another s an urgent task n HBase. The paper studes the prort growth probablt of the prort replcaton queue and proposed a dnamc prort replcaton task queue algorthm based on the earlest deadlne frst algorthm (EDF). The eperment results show that the proposed algorthm can balance the replcaton overhead between the hgh and low prort tasks and avod the low prort task starvng to death as well as ensure the hgh prort task s nterests. Inde Terms non-structure data, HBsase, replcaton queue, earlest deadlne frst algorthm (EDF) I. ITRDUCTI The nteracton among users generates more and more non-structure data n Web.0 era. These non-structure data have no specfc structure and cannot be descrbed wth some a certan format, such as the mcro bloggng message (ncludng the @ and hperlnks, pctures, etc.), the ml fle and so on. Internet companes establsh large amount of ggantc datacenters around the world to store these non-structure data. The number of hosts n a sngle data center can be several hundred to tens of thousands. Google has more than 50 data centers and 0 mllon servers [] to store ts customers' dal producton of massve amounts of non-structured data around the world. It s a bg challenge to manage and use these data, ncludng data readng, data storng, data ndeng, data addressng, nterface of data confguraton and management, partcularl the data replcaton among multple data centers s more urgent. BgTable [,3] s a dstrbuted storage sstem developed at Google for managng structured data and has the capablt to scale to a ver large sze: petabtes of data across thousands of commodt servers. BgTable has the ablt to store structured data wthout frst defnng a schema provdes developers wth greater fleblt when buldng applcatons, and elmnates the need to re-factor an entre database as those applcatons evolve. However, BgTable cannot manage structured data. HBase [4-6] s the Hadoop database, whch s an Apache open source project whose goal s to provde Bg Table lke storage. HBase s for storng huge amounts of structured or sem-structured data. Data s logcall organzed nto tables, rows and columns. Columns ma have multple versons for the same row ke. The data model s smlar to that of Bg Table. Smlar to the tradtonal data transmsson or routng servces, dfferent data replcaton tasks among dstrbuted data centers are of dfferent QoS requrements because column famles ma belong to dfferent busness, users, columns wth dfferent prortes and requrements for dela, bandwdth, and avalablt are varous accordng to dfferent busness and data. Therefore, t s necessar to provde a replcaton task management mechansm based on task prort for HBase data replcaton protocols. It can mplement dnamc prort management for cache queue of replcaton task. Luo [0] presents a probablt-prort herarchcal schedulng algorthm. Compared to the prort queue schedulng algorthm [,], the algorthm can on the one hand promse the tme dela and data loss performance of hgh prort data packet groups and on the other hand mprove the packet loss performance of low prort data packet groups. In ths paper, we propose a dnamc prort schedulng scheme based on the sequence of prort growth probablt referrng to the prort sequence of the earlest deadlne frst algorthm (EDF) [7-9]. The prort s dvded nto three levels: low, mddle and hgh wth correctng. The dnamc prort schedulng method can balance the replcaton overhead between the hgh and low prort tasks and avod the low prort task starvng to death as well as ensure the hgh prort task s nterests The rest of paper s organzed as follows. Secton ntroduce the EDF algorthm whch s the bass of our algorthm n the net secton. The man work of the dnamc prort schedulng scheme s descrbed n the Secton 3. Secton 4 gves the eperment an analss of the proposed algorthm. Secton 5 concludes the paper. do:0.4304/jsw..7.765-769

766 JURAL F SFTWARE, VL.,. 7, JULY 03 II. A VERVIEW F EDF The earlest deadlne frst schedulng algorthm(edf) [7-9] s wdel used as a prort schedulng algorthm. It calculates prort n accordance wth the task deadlne and assgned a hgher prort to the task close to the deadlne. The task wth hghest prort s promsed to run at ever moment. EDF acheves a dnamc prort schedulng algorthm for the deadlne of the task n buffer queue ma change as tme goes b. EDF algorthm can alwas obtan a feasble schedule as long as there s one. In other words, f EDF algorthm cannot generate a feasble schedule, there wll be no other feasble schedule. In ever new read state, EDF selects the task wth the earlest deadlne from the task that s read but not et full processed, and allocates the requred resources to the task. The scheduler mmedatel recalculates the deadlne of the task and gves new prort order as new tasks add. It deprves runnng tasks of control rght on processors and decdes whether to schedule a new task or not accordng to the new task s deadlne. The new task ma be processed mmedatel f ts deadlne s earler than the current task. In accordance wth the EDF algorthm, the processng of the nterrupted task wll resume later. EDF algorthm has a smple necessar and suffcent condton to determne the schedulablt: as long as the load of the perodc task set U s not greater than. EDF algorthm can generate feasble schedulng and has such characterstcs as: ) Task model: the same as RMS (Rate-Monotonc Schedulng); ) Prort assgnment method: Prort s dnamcall allocated as the nearer to the deadlne the hgher t s; n C 3) Schedulablt: If the task set meets T, the task s schedulable. EDF schedulng algorthm has been shown to be the optmal dnamc schedulng wth necessar and suffcent condton. It s of up to 00%CPU utlzaton wth more onlne schedulng overhead than the RMS. EDF schedulng algorthm s establshed based on the followng assumptons: ) The emptve cost s ver small; ) nl processng requrements are sgnfcant, the I /, memor and other resource requrements can be gnored; 3) All tasks are ndependent, there s no prort relatonshp constrant among them. These assumptons smplf the analss of the EDF. Assumpton shows that the msson to seze at an tme, ths process s not preempted for an loss, can be restored at a later tme, one task was to seze the number does not change the overall workload of the processor. Assumpton shows that there are no other factors that lead to comple problems ecept suffcent processng capact to ensure performng tasks wthn the tme lmt to check the feasblt. Assumpton 3 specfes that there does not est a prort constrant relatonshp whch means that the release tme of the task s ndependent upon the end tme of other tasks. As to sstem whch s not met the above three assumptons, we need to take prort and ecluson constrants to solve the problem. The EDF algorthm s the optmal dnamc schedulng algorthm wth sngle-processor. The upper lmt ts schedulablt s 00%, that s to sa, f the EDF algorthm cannot schedule a task set reasonable on a sngle processor, then the other schedulng algorthm also cannot accomplsh ths task. III. PRBABILITY PRIRITY GRWTH ALGRITHM BASED EDF A. Prort of Replcaton Task n HBase In dstrbuted database HBase, column faml data beng to replcate and snchronze ma be of dfferent mportance because of requrements and busness tpes. It s necessar to set dfferent prort to dfferent column faml or ts column accordng to the dfferent QoS requrements [3-5] n order to dstngush data of dfferent prort durng data replcaton among data centers. An mproved prort queue for HBase replcaton can be constructed accordng to the theor of EDF. The prort of column faml data ma be broken down nto ts column. It ma also be dfferent because t comes from dfferent user or tme. The replcated data to be stored b dfferent prort task ma belong to one column faml, so t ma not onl reduce the queue length but also make the send, store, and read more batches, contnuous and rapd b mergng the tasks storng the same column faml. In some mplementaton of the prort queue, the prort of each task s statc confguraton and won t change over tme. When more hgh-prort tasks queued, low-prort task s lkel to be "starved to death for t s alwas unable to get servce. Accordng to the dea of EDF, the prort of task should be dnamcall adjust and ncrease over tme. The mplementaton of the dnamc prort acts as follows: ) Each replcaton task jonng the queue s set to an ntal prort; ) The prort of ever task ncreases over ever perod; 3) Each tme, the task of the hghest prort n the queue ma be selected to be replcated. Assumng that there are a total of prortes from to, s the lowest prort and represents the hghest prort, the bgger the number the hgher the prort. Accordng to the above conclusons, the prort of replcaton task n HBase column faml should nclude the task s mergng prort and the growng prort over tme. B. Prort of the Merger Task The mergng of multple replcaton tasks s based on ther same nde of the storage locaton, that s, the belong to the same column faml, but wth dfferent prortes. bvousl, the prort of the merged task s at least equal to the hghest prort of orgnal task to

JURAL F SFTWARE, VL.,. 7, JULY 03 767 ensure that the orgnal mportant task remans a hgh prort. If the prort of task s t and the mamum number of the task to merge s MAX, then, the prort T of the task,,...,... n after beng merged s: T=ma{t, t, t, t n }(n MAX) () C. Smple Prort Queue Algorthm Based on EDF Assume that a total of prort from to, where s the lowest prort and represents the hghest prort. Prort of the task n the queue s ncreased b ever perod n the smple prort queue based on EDF(SPQA). Each tme the task of the hghest prort s selected to eecuton. Ths dnamc prort avods the task wth low prort dng of starvaton, but t s unfar for hghprort task that the prort of low-prort task s growng too fast. For eample, n the case of eght prortes from to, the task of the ntal prort settng as 5 ma grow up to 7 after two perods n the buffer queue. Assumng that there comes a new task wth prort of 7, t wll not be replcated because the task wth ntal prort of 5 catches the chance. It seems unfar to the hgh prort task for the prort of the lower prort growng too fast task. The senstvt to the tme of the low prort s lowered to solve ths problem whch can reduce the growng rate of the low prort tasks as shown n Fgure. We can construct a prort growth probablt sequence {P,} ( =,,..., ) based on the total number of prort, whch makes the low-prort tasks has a lower prort growth probablt and hgh-prort task has a hgh prort growth probablt. growth probablt / prort Fgure. Prort growth probablt D. Probablt Prort Growth Algorthm Based on EDF (PPGA) In order to make the prort growth probablt of the low-prort task lower and the hgh-prort task wth hgh prort growth probablt, we ntends to construct a curve whose shape s smlar to fgure to determne the prort growth probablt. In the fgure, the slope of the frst half of the curve s greater than whle the latter half s less than. The curve s establshed wth the followng condton: growng rate s / / 0 0 d () / prort Fgure. Reduce rate of of prort growth probablt Assumng that the lnear functon s () = a + b (0, s an nteger), we can obtan: So, a b (3) Y( ) (0, s an nteger) (4) Let P =, solve the prort growth probablt of prort as follows: P = P - =P -(-) (5) P =P +-() Add up the left and rght part of the above equatons respectvel to get: P ( )( ) ( ) ( k) (0 ) k For eample, we construct a prort growth probablt wth eght prort and calculate the reduce rate of each prort growth probablt b the equaton (4) and get the prort growth probablt of each prort b the equaton (6). The results are shown n table and fgure 3. Prort growth probablt s not ntutve enough to obtan average perod between the two prortes. It can be calculated as the epectatons E (n): E( n) P ( P) P 3( P) P P k k P( k( P) k k( P) ) k ' P... (6) (7)

76 JURAL F SFTWARE, VL.,. 7, JULY 03 TABLE I. PRIRITY GRWTH PRBABILITY prort Reduce rate Growth probablt Average perod 7/3 47/3 3/47 6/3 /3 3/ 3 5/3 7/3 3/7 4 4/3 /3 3/ 5 3/3 6/3 3/6 6 /3 9/3 3/9 7 /3 3/3 3/3 0 ncreased to lastl. PPGA balances the replcaton overhead between the hgh prort and low prort tasks. /prort 7 6 5 4 3 As shown n fgure 3, the probablt sequence constructed for the prort growth s ncremental nonlnear, prort growth probablt of low-prort task s small whle the hgh-prort task s s large to ensure ts nterests. It can be seen from the slope of eght broken lne that low prort tasks has a small prort growth probablt, but t has a hgh growng rate of growth probablt, the slope slowl decreases wth the prort growth. Low prort task can get a larger ncrease of the growth probablt to ensure t wll not be starved to death. /prort 7 3 4 5 6 7 Fgure 4. Prort Increase n SPQA /perod 6 growth probablt 0.75 0.5 SPQA PPGA 5 4 3 3 4 5 6 7 /perod 0.5 Fgure 5. Prort Increase n PPGA 3 4 5 6 7 prort Fgure 3. Prort Growth Probablt of SPQA and PPGA E. Analss of Epermental Results The effects of prort growth probablt are compared n the followng eperments. Here, the task whch prort s, 4 and 7 are chosen from the heap of the mamum prort queue accordng to the prort growth probablt. As shown n fgure 4, prort of the task n the queue s ncreased b ever perod n SPQA and the prort of low-prort task s growng too fast. After eght perods, the task wth hgh prort 7 ncreased to b one perod, the task wth prort 4 ncreased to b fve perods, the task wth lower prort ncreased to b eght perods. However, the fast growng of prort s changed n PPGA. In fgure 5, the task wth hgh prort 7 stll ncreased to b one perod; but the task wth lower prort has no change untl the eghth perod, and onl IV CCLUSIS We studed the prort queue theor of the earlest deadlne schedulng algorthm and the storng characterstcs of column faml n HBase n whch the column faml s the unt of replcaton tasks. A prort queue algorthm based on the prort growth probablt s establshed, ts prort growth probablt sequence s evenl dstrbuted n the nterval [0, ]. The probablt growng rate of low prort s large and the hgh prort s s small. The algorthm balances the replcaton overhead between the hgh prort and low prort tasks and avods the low prort task starvng to death as well as ensures the hgh prort task s nterests. ACKWLEDGMET Ths work s supported b the work of the Bejng Muncpal rganzaton Department of talents tranngfunded project (00D0050700000),Bejng Insttute of Archtectural Engneerng School research fund (Z0053) and Jln Unverst Ke Laborator of

JURAL F SFTWARE, VL.,. 7, JULY 03 769 Smbolc Computaton and Knowledge Engneerng of Mnstr of Educaton research fund (93K-7-0-0). REFERECES [] http://www.cnbeta.com/artcles/7330.htm. [] Chang, F. and Dean, J. and Ghemawat, S, et.al., Bgtable: A dstrbuted storage sstem for structured data, ACM Transactons on Computer Sstems (TCS), 00, 6(): 4. [3] Ankur Khetrapal, Vna Ganesh, HBase and Hpertable for large scale dstrbuted storage sstems: A Performance evaluaton for pen Source BgTable Implementatons, from Internet. [4] Dhruba Borthakur,The Hadoop Dstrbuted Fle Sstem: Archtecture and Desgn, Avalable at http://wk.apache.org/hadoop. [5] HadoopDB Project, Avalable at http://db.cs.ale.edu/hadoopdb/hadoopdb.html. [6] Azza Abouzed, et.al., HadoopDB: An Archtectural Hbrd of MapReduce and DBMS Technologes for Analtcal Workloads, proceedngs of VLDB 09, 009, Lon, France,pp9-933. [7] Zh Quan, Jong-Moon Chung, A Statstcal Framework for EDF Schedulng, IEEE CMMUICATIS LETTERS, VL. 7,. 0, CTBER 003, pp. 493 495. [] Vctor Frou, Jm Kurose,Don Towsle, Effcent Admsson Control of Pecewse Lnear Traffc Envelopes at EDF Schedulers, IEEE/ACM TRASACTIS ETWRKIG, VL. 6,. 5, CTBER 99, pp. 55 570. [9] Janjun L, et.al., Workload Effcent Deadlne and Perod Assgnment for Mantanng Temporal Consstenc under EDF, IEEE TRASACTIS CMPUTERS, pp.- 4. [0] Luo hume, Gao qang, Song shuang, The stud of herarchcal packer schedulng algorthm on probabltprort, Computer Applcatons and Software, 0, (7), pp.57-59. [] Jang Y, Tham C K,Ko C C, A probablstc prort schedulng dscplne for mult-servce networks[c]. Proc of IEEE ISCC 0. Tunsa:[S n],00, pp. 0450. [] Tham C K,Yao Q,Ko C C. Achevng dfferentated servces through mult-class probablstc prort schedulng[j]. Computer etworks. 00, 40(4):577-593. [3] Guangjun Guo, Fe Yu, Zhgang Chen, Dong Xe. A Method for Semantc Web Servce Selecton Based on QoS ntolog. Journal of Computers, Vol 6, o (0), 377-36, Feb 0. [4] Elarb Badd, Larb Esmah.A Scalable Framework for Polc-based QoS Management n SA Envronments. Journal of Software, Vol 6, o 4 (0), 544-553, Apr 0. [5] Bn L, Yan Xu, Jun Wu, Junwu Zhu. A Petr-net and QoS Based Model for Automatc Web Servce Composton. Journal of Software, Vol 7, o (0), 49-55, Jan 0. Zhang Changlun, was born n Jnng Shangdong Provnce of Chna n 97, earned Ph.D degree n Bejng Jaotong Unverst of Chna n 009. ow, He s a lecturer n Scence School, Bejng Unverst of Cvl Engneerng and Archtecture. Hs research area focuses on networks nformaton securt, network publc opnon and software engneer.