A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems

Size: px
Start display at page:

Download "A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems"

Transcription

1 A Cost-Effectve Strategy for Intermedate Data Storage n Scentfc Cloud Workflow Systems Dong Yuan, Yun Yang, Xao Lu, Jnjun Chen Faculty of Informaton and Communcaton Technologes, Swnburne Unversty of Technology Hawthorn, Melbourne, Australa 3122 {dyuan, yyang, xlu, jchen}@swn.edu.au Abstract Many scentfc workflows are data ntensve where a large volume of ntermedate data s generated durng ther executon. Some valuable ntermedate data need to be stored for sharng or reuse. Tradtonally, they are selectvely stored accordng to the system storage capacty, determned manually. As dong scence on cloud has become popular nowadays, more ntermedate data can be stored n scentfc cloud workflows based on a pay-foruse model. In ths paper, we buld an Intermedate data Dependency Graph (IDG) from the data provenances n scentfc workflows. Based on the IDG, we develop a novel ntermedate data storage strategy that can reduce the cost of the scentfc cloud workflow system by automatcally storng the most approprate ntermedate datasets n the cloud storage. We utlse Amazon s cost model and apply the strategy to an astrophyscs pulsar searchng scentfc workflow for evaluaton. The results show that our strategy can reduce the overall cost of scentfc cloud workflow executon sgnfcantly. Keywords - data storage; cost; scentfc workflow; cloud computng I. INTRODUCTION Scentfc applcatons are usually complex and datantensve. In many felds, lke astronomy [9], hgh-energy physcs [17] and bo-nformatcs [19], scentsts need to analyse terabytes of data ether from exstng data resources or collected from physcal devces. The scentfc analyses are usually computaton ntensve, hence takng a long tme for executon. Workflow technologes can be facltated to automate these scentfc applcatons. Accordngly, scentfc workflows are typcally very complex. They usually have a large number of tasks and need a long tme for executon. Durng the executon, a large volume of new ntermedate data wll be generated [10]. They could be even larger than the orgnal data and contan some mportant ntermedate results. After the executon of a scentfc workflow, some ntermedate data may need to be stored for future use because: 1) scentsts may need to re-analyse the results or apply new analyses on the ntermedate data; 2) for collaboraton, the ntermedate results are shared among scentsts from dfferent nsttutons and the ntermedate data can be reused. Storng valuable ntermedate data can save ther regeneraton cost when they are reused, not to menton the watng tme saved for regeneraton. Gven the large sze of the data, runnng scentfc workflow applcatons usually need not only hgh performance computng resources but also massve storage [10]. Nowadays, popular scentfc workflows are often deployed n grd systems [17] because they have hgh performance and massve storage. However, buldng a grd system s extremely expensve and t s normally not open for scentsts all over the world. The emergence of cloud computng technologes offers a new way to develop scentfc workflow systems n whch one research topc s cost-effectve strateges for storng ntermedate data. In late 2007 the concept of cloud computng was proposed [23] and t s deemed as the next generaton of IT platforms that can delver computng as a knd of utlty [8]. Foster et al. made a comprehensve comparson of grd computng and cloud computng [12]. Cloud computng systems provde the hgh performance and massve storage requred for scentfc applcatons n the same way as grd systems, but wth a lower nfrastructure constructon cost among many other features, because cloud computng systems are composed of data centres whch can be clusters of commodty hardware [23]. Research nto dong scence and data-ntensve applcatons on the cloud has already commenced [18], such as early experences lke Nmbus [15] and Cumulus [22] projects. The work by Deelman et al. [11] shows that cloud computng offers a cost-effectve soluton for data-ntensve applcatons, such as scentfc workflows [14]. Furthermore, cloud computng systems offer a new model that scentsts from all over the world can collaborate and conduct ther research together. Cloud computng systems are based on the Internet, and so are the scentfc workflow systems deployed n the cloud. Scentsts can upload ther data and launch ther applcatons on the scentfc cloud workflow systems from everywhere n the world va the Internet, and they only need to pay for the resources that they use for ther applcatons. As all the data are managed n the cloud, t s easy to share data among scentsts. Scentfc cloud workflows are deployed n a cloud computng envronment, where all the resources need to be pad for use. For a scentfc cloud workflow system, storng all the ntermedated data generated durng workflow executons may cause a hgh storage cost. On the contrary, f we delete all the ntermedate data and regenerate them every tme when needed, the computaton 1

2 cost of the system may well be very hgh too. The ntermedate data management s to reduce the total cost of the whole system. The best way s to fnd a balance that selectvely store some popular datasets and regenerate the rest of them when needed. In ths paper, we propose a novel strategy for the ntermedate data storage of scentfc cloud workflows to reduce the overall cost of the system. The ntermedate data n scentfc cloud workflows often have dependences. Along workflow executon, they are generated by the tasks. A task can operate on one or more datasets and generate new one(s). These generaton relatonshps are a knd of data provenance. Based on the data provenance, we create an Intermedate data Dependency Graph (IDG), whch records the nformaton of all the ntermedate datasets that have ever exsted n the cloud workflow system, no matter whether they are stored or deleted. Wth the IDG, the system knows how the ntermedate datasets are generated and can further calculate ther generaton cost. Gven an ntermedate dataset, we dvde ts generaton cost by ts usage rate, so that ths cost (the generaton cost per unt tme) can be compared wth ts storage cost per tme unt, where a dataset s usage rate s the tme between every usage of ths dataset that can be obtaned from the system log. Our strategy can automatcally decde whether an ntermedate dataset should be stored or deleted n the cloud system by comparng the generaton cost and storage cost, and no matter ths ntermedate dataset s a new dataset, regenerated dataset or stored dataset n the system. The remnder of ths paper s organsed as follows. Secton 2 gves a motvatng example and analyses the research problems. Secton 3 ntroduces some mportant related concepts to our strategy. Secton 4 presents the detaled algorthms n our strategy. Secton 5 demonstrates the smulaton results and the evaluaton. Secton 6 dscusses the related work. Secton 7 addresses our conclusons and future work. II. MOTIVATING EXAMPLE AND PROBLEM ANALYSIS 2.1 Motvatng example Scentfc applcatons often need to process a large amount of data. For example, Swnburne Astrophyscs group has been conductng a pulsar searchng survey usng the observaton data from Parkes Rado Telescope ( whch s one of the most famous rado telescopes n the world ( Pulsar searchng s a typcal scentfc applcaton. It contans complex and tme consumng tasks and needs to process terabytes of data. Fg. 1 depcts the hgh level structure of a pulsar searchng workflow. Fgure 1. Pulsar searchng workflow At the begnnng, raw sgnal data from Parkes Rado Telescope are recorded at a rate of one ggabyte per second by the ATNF 1 Parkes Swnburne Recorder (APSR). Depends on dfferent areas n the unverse that the researchers want to conduct the pulsar searchng survey, the observaton tme s normally from 4 mnutes to one hour. Recordng from the telescope n real tme, these raw data fles have data from multple beams nterleaved. For ntal preparaton, dfferent beam fles are extracted from the raw data fles and compressed. They are 1GB to 20GB each n sze, depends on the observaton tme. The beam fles contan the pulsar sgnals whch are dspersed by the nterstellar medum. De-dsperson s to counteract ths effect. Snce the potental dsperson source s unknown, a large number of de-dsperson fles wll be generated wth, dfferent dsperson trals. In the current pulsar searchng 1 ATNF refers to the Australan Telescope Natonal Faclty. survey, 1200 s the mnmum number of the dsperson trals. Based on the sze of the nput beam fle, ths dedsperson step wll take 1 to 13 hours to fnsh and generate up to 90GB of de-dsperson fles. Furthermore, for bnary pulsar searchng, every de-dsperson fle wll need another step of processng named accelerate. Ths step wll generate the accelerated de-dsperson fles wth the smlar sze n the last de-dsperson step. Based on the generated de-dsperson fles, dfferent seekng algorthms can be appled to search pulsar canddates, such as FFT Seekng, FFA Seekng, and Sngle Pulse Seekng. For a large nput beam fle, t wll take more than one hour to seek the 1200 de-dsperson fles. A canddate lst of pulsars wll be generated after the seekng step whch s saved n a text fle. Furthermore, by comparng the canddates generated from dfferent beam fles n a same tme sesson, some nterference may be detected and some canddates may be elmnated. Wth the fnal pulsar canddates, we need to go back to the beam fles or the de- 2

3 dsperson fles to fnd ther feature sgnals and fold them to XML fles. At last, the XML fles wll be vsual dsplayed to researchers for makng decsons on whether a pulsar has been found or not. As descrbed above, we can see that ths pulsar searchng workflow s both computaton and data ntensve. It s currently runnng on Swnburne hgh performance supercomputng faclty ( It needs long executon tme and a large amount of ntermedate data s generated. At present, all the ntermedate data are deleted after havng been used, and the scentsts only store the raw beam data, whch are extracted from the raw telescope data. Whenever there are needs of usng the ntermedate data, the scentsts wll regenerate them based on the raw beam fles. The ntermedated data are not stored, manly because the supercomputer s a shared faclty that can not offer unlmted storage capacty to hold the accumulated terabytes of data. However, some ntermedate data are better to be stored. For example, the de-dsperson fles are frequently used ntermedate data. Based on them, the scentsts can apply dfferent seekng algorthms to fnd potental pulsar canddates. Furthermore, some ntermedate data are derved from the de-dsperson fles, such as the results of the seek algorthms and the pulsar canddate lst. If these data are reused, the de-dsperson fles wll also need to be regenerated. For the large nput beam fles, the regeneraton of the de-dsperson fles wll take more than 10 hours. It not only delays the scentsts from conductng ther experments, but also wastes a lot of computaton resources. On the other hand, some ntermedate data may not need to be stored. For example, the accelerated de-dsperson fles, whch are generated by the accelerate step. The accelerate step s an optonal step that s only for the bnary pulsar searchng. Not all pulsar searchng processes need to accelerate the de-dsperson fles, so the accelerated de-dsperson fles are not that often used. In lght of ths and gven the large sze of these data, they are not worth to store as t would be more cost effectve to regenerate them from the de-dsperson fles whenever they are used. 2.2 Problem analyss Tradtonally, scentfc workflows are deployed on the hgh performance computng facltes, such as clusters and grds. Scentfc workflows are often complex wth huge ntermedate data generated durng ther executon. How to store these ntermedate data s normally decded by the scentsts who use the scentfc workflows. Ths s because the clusters and grds only serve for certan nsttutons. The scentsts may store the ntermedate data that are most valuable to them, based on the storage capacty of the system. However, n many scentfc workflow systems, the storage capactes are lmted, such as the pulsar searchng workflow we ntroduced. The scentsts have to delete all the ntermedate data because of the storage lmtaton. Ths bottleneck of storage can be avoded f we run scentfc workflows n the cloud. In a cloud computng envronment, theoretcally, the system can offer unlmted storage resources. All the ntermedate data generated by scentfc cloud workflows can be stored, f we are wllng to pay for the requred resources. However, n scentfc cloud workflow systems, whether to store ntermedate data or not s not an easy decson anymore. 1) All the resources n the cloud carry certan costs, so ether storng or generatng an ntermedate dataset, we have to pay for the resources used. The ntermedate datasets vary n sze, and have dfferent generaton cost and usage rate. Some of them may often be used whlst some others may be not. On one hand, t s most lkely not cost effectve to store all the ntermedate data n the cloud. On the other hand, f we delete them all, regeneraton of frequently used ntermedate datasets mposes a hgh computaton cost. We need a strategy to balance the generaton cost and the storage cost of the ntermedate data, n order to reduce the total cost of the scentfc cloud workflow system. In ths paper, gven the large capacty of data centre and the consderaton of cost effectveness, we assumng that all the ntermedate data are stored n one data centre, therefore, data transfer cost s not consdered. 2) The scentsts can not predct the usage rate of the ntermedate data anymore. For a sngle research group, f the data resources of the applcatons are only used by ts own scentsts, the scentsts may predct the usage rate of the ntermedate data and decde whether to store or delete them. However, the scentfc cloud workflow system s not developed for a sngle scentst or nsttuton, rather, developed for scentsts from dfferent nsttutons to collaborate and share data resources. The users of the system could be anonymous from the Internet. We must have a strategy storng the ntermedate data based on the needs of all the users that can reduce the cost of the whole system. Hence, for scentfc cloud workflow systems, we need a strategy that can automatcally select and store the most approprate ntermedate datasets. Furthermore, ths strategy should be cost effectve that can reduce the total cost of the whole system. III. COST ORIENTED INTERMEDIATE DATA STORAGE IN SCIENTIFIC CLOUD WORKFLOWS 3.1 Data management n scentfc cloud workflow systems In a cloud computng system, applcaton data are stored n large data centres. The cloud users vst the system va the Internet and upload the data to conduct ther applcatons. All the applcaton data are stored n the cloud storage and managed by the cloud system ndependent of users. As tme goes on and the number of cloud users ncreases, the volume of data stored n cloud wll become huge. Ths makes the data management n cloud computng system a very challengng job. 3

4 Scentfc cloud workflow system s the workflow system for scentsts to run ther applcatons n the cloud. As depcted n Fgure 2, t has many dfferences wth the tradtonal scentfc workflow systems n data Fgure 2. Structure of data management n scentfc cloud workflow system management. The most mportant ones are as follows. 1) For scentfc cloud workflows, all the applcaton data are managed n the cloud. To launch ther workflows, the scentsts have to upload ther applcaton data to the cloud storage va a Web portal. Ths requres data management to be automatc. 2) The scentfc cloud workflow system has a cost model. The scentsts have to pay for the resources used for conductng ther applcatons. Hence, the data management has to be cost orented. 3) The scentfc cloud workflow system s based on the Internet, where the applcaton data are shared and reused among the scentsts world wde. For the data reanalyses and regeneratons, data provenance s more mportant n scentfc cloud workflows. In general, there are two types of data stored n the cloud storage, nput data and ntermedate data ncludng result data. Frst, nput data are the data uploaded by users, and n the scentfc applcatons they also can be the raw data collected from the devces. These data are the orgnal data for processng or analyss that are usually the nput of the applcatons. The most mportant feature of these data s that f they were deleted, they could not be regenerated by the system. Second, ntermedate data are the data newly generated n the cloud system whle the applcaton runs. These data save the ntermedate computaton results of the applcaton that wll be used n the future executon. In general, the fnal result data of the applcatons are a knd of ntermedate data, because the result data n one applcaton can also be used n other applcatons. When further operatons apply on the result data, they become ntermedate data. Hence, the ntermedate data are the data generated based on ether the nput data or other ntermedate data, and the most mportant feature s that they can be regenerated f we know ther provenance. For the nput data, the users wll decde whether they should be stored or deleted, snce they can not be regenerated once deleted. For the ntermedate data, ther storage status can be decded by the system, snce they can be regenerated. Hence, n ths paper we develop a strategy for ntermedate data storage that can sgnfcantly reduce the cost of scentfc cloud workflow system. 3.2 Data provenance and Intermedate data Dependency Graph (IDG) Scentfc workflows have many computaton and data ntensve tasks that wll generate many ntermedate datasets of consderable sze. There are dependences exst among the ntermedate datasets. Data provenance n workflows s a knd of mportant metadata, n whch the dependences between datasets are recorded [21]. The dependency depcts the dervaton relatonshp between workflow ntermedate datasets. For scentfc workflows, data provenance s especally mportant, because after the executon, some ntermedate datasets may be deleted, but sometmes the scentsts have to regenerate them for ether reuse or reanalyss [7]. Data provenance records the nformaton of how the ntermedate datasets were generated, whch s very mportant for the scentsts. Furthermore, regeneraton of the ntermedate datasets from the nput data may be very tme consumng, and therefore carry a hgh cost. Wth data provenance nformaton, the regeneraton of the demandng dataset may start from some stored ntermedated datasets nstead. In the scentfc cloud workflow system, data provenance s recorded whle the workflow executon. Takng advantage of data provenance, we can buld an IDG based on data provenance. All the ntermedate datasets once generated n the system, whether stored or deleted, ther references are recorded n the IDG. Fgure 3. A smple Intermedate data Dependency Graph (IDG) 4

5 IDG s a drected acyclc graph, where every node n the graph denotes an ntermedate dataset. Fgure 3 shows us a smple IDG, dataset d 1 s ponted to d 2 means d 1 s used to generate d 2 ; dataset d 2 and d 3 are ponted to d 4 means d 2 and d 3 are used together to generate d 4 ; and d 5 s ponted to d 6 and d 7 means d 5 s used to generate ether d 6 or d 7 based on dfferent operatons. In the IDG, all the ntermedate datasets provenances are recorded. When some of the deleted ntermedate datasets need to be reused, we do not need to regenerate them from the orgnal nput data. Wth the IDG, the system can fnd the predecessor datasets of the demandng data, so they can be regenerated from ther nearest exstng predecessor datasets. 3.3 Cost model Wth the IDG, gven any ntermedate datasets that ever occurred n the system, we know how to regenerate t. However, n ths paper, we am at reducng the total cost of managng the ntermedate data. In a cloud computng envronment, f the users want to deploy and run applcatons, they need to pay for the resources used. The resources are offered by cloud servce provders, who have ther cost models to charge the users. In general, there are two basc types of resources n cloud computng: storage and computaton. Popular cloud servces provders cost models are based on these two types of resources [1]. Furthermore, the cost of data transfer s also consdered, such as n Amazon s cost model. In [11], the authors state that a cost-effectve way of dong scence n the cloud s to upload all the applcaton data to the cloud and run all the applcatons n the cloud servces. So we assume that the scentsts upload all the nput data to the cloud to conduct ther experments. Because transferrng data wthn one cloud servce provder s facltes s usually free, the data transfer cost of managng ntermedate data durng the workflow executon s not counted. In ths paper, we defne our cost model for managng the ntermedate data n a scentfc cloud workflow system as follows: Cost=C+S, where the total cost of the system, Cost, s the sum of C, whch s the total cost of computaton resources used to regenerate the ntermedate data, and S, whch s the total cost of storage resources used to store the ntermedate data. For the resources, dfferent cloud servce provders have dfferent prces. In ths paper, we use Amazon servces prce as follows: $0.15 per Ggabyte per month for the storage resources. $0.1 per CPU hour for the computaton resources. Furthermore, we denote these two prces as CostS and CostC for the algorthms respectvely. To utlse the cost model, we defne some mportant attrbutes for the ntermedate datasets n the IDG. For ntermedate dataset d, ts attrbutes are denoted as: <sze, flag, t p, t, pset, fset, CostR>, where sze, denotes the sze of ths dataset; flag, denotes the status whether ths dataset s stored or deleted n the system; t p, denotes the tme of generatng ths dataset from ts drect predecessor datasets; t, denotes the usage rate, whch s the tme between every usage of d n the system. In tradtonal scentfc workflows, t can be defned by the scentsts, who use ths workflow collaboratvely. However, a scentfc cloud workflow system s based on the Internet wth large number of users, as we dscussed before, d can not be defned by users. It s a forecastng value from the dataset s usage hstory recorded n the system logs. t s a dynamc value that changes accordng to d s real usage rate n the system. pset, s the set of references of all the deleted ntermedate datasets n the IDG that lnked to d, whch s shown n Fgure 4. If we want to regenerate d, d.pset contans all the datasets that need to be regenerated beforehand. Hence, the generaton cost of d can be denoted as: gencost( d ) = ( d. t p + d d. pset d j. t p ) CostC ; fset, s the set of references of all the deleted ntermedate datasets n the IDG that are lnked by d, whch s shown n Fgure 4. If d s deleted, to regenerate any datasets n d.fset, we have to regenerate d frst. In another word, f the storage status of d has changed, the generaton cost of all the datasets n d.fset wll be affected by gencost(d );... Fgure 4. A segment of IDG CostR, s d s cost rate, whch means the average cost per tme unt of the dataset d n the system, n ths paper we use hour as tme unt. If d s a stored dataset, d. CostR = d. sze CostS. If d s a deleted dataset n the system, when we need to use d, we have to regenerate t. So we dvde the generaton cost of d by the tme between ts usages and use ths value as the cost rate of d n the system. d. CostR = gencost( d) d. t. When the storage status of d s changed, ts CostR wll be changed correspondngly. Hence, the system cost rate of managng ntermedate data s the sum of of all the ntermedate datasets, whch s d IDG ( d. CostR). Gven tme duraton, denoted as [T 0, T n ], the total system cost s the ntegral of the system cost rate n ths duraton as a functon of tme t, whch s: j... 5

6 Total _ Cost ( ( d. CostR) ) = T = n t T d IDG dt 0 The goal of our ntermedate data management s to reduce ths cost. In the next secton, we wll ntroduce a dependency based ntermedate data storage strategy, whch selectvely stores the ntermedate datasets to reduce the total cost of the scentfc cloud workflow system. IV. DEPENDENCY BASED INTERMEDIATE DATA STORAGE STRATEGY The IDG records the references of all the ntermedate datasets and ther dependences that ever occurred n the system, some datasets may be stored n the system, and others may be deleted. When new datasets are generated n the system, ther nformaton s added to the IDG at the frst tme. Our dependency based ntermedate data storage strategy s developed based on the IDG, and appled at workflow runtme. It can dynamcally store the essental ntermedate datasets durng workflow executon. The strategy contans three algorthms descrbed n ths secton. 4.1 Algorthm for decdng newly generated ntermedate datasets storage status Suppose d 0 s a newly generated ntermedate dataset. Frst, we add ts nformaton to the IDG. We fnd the provenance datasets of d 0 n the IDG, and add edges ponted to d 0 from these datasets. Then we ntalse ts attrbutes. As d 0 does not have a usage hstory yet, we use the average value n the system as the ntal value of d 0 s usage rate. Next, we check f d 0 needs to be stored or not. As d 0 s newly added n the IDG, t does not have successor datasets n the IDG, whch means no ntermedate datasets are derved from d 0 at ths moment. For decdng whether to store or delete d 0, we only compare the generaton cost and storage cost of d 0 tself, whch are gencost ( d0) d0. t and d0. sze CostS. If the cost of generaton s larger than the cost of storng t, we save d 0 and set d0. CostR = d0. sze CostS, otherwse we delete d 0 and set d 0. CostR = gencost( d0) d0. t. The algorthm s shown n Fgure 5. ( d. t + d. t ) CostC; gencost( d0) = d d0. pset p 0 gencost d ) d. t > d. sze CostS ( d0. CostR = d0. sze CostS; p d CostR = gencost( d ) d. ; t Fgure 5. Algorthm for handlng newly generated datasets In ths algorthm, we guarantee that all the ntermedate datasets chosen to be stored are necessary, whch means that deletng anyone of them would ncrease the cost to the system, snce they all have a hgher generaton cost than storage cost. 4.2 Algorthm for managng stored ntermedate datasets The usage rate t of a dataset s an mportant parameter that determnes ts storage status. Snce t s a dynamc value that may change at any tme, we have to dynamcally check the stored ntermedate datasets n the system that whether they stll need to be stored. For an ntermedate dataset d 0 that s stored n the system, we set a threshold tme t θ, where d0. tθ = gencost( d0) ( d0. sze CostS). Ths threshold tme ndcates how long ths dataset can be stored n the system wth the cost of generatng t. If d 0 has not been used for the tme of t θ, we wll check whether t should be stored anymore. If we delete stored ntermedate dataset d 0, the system cost rate s reduced by d 0 s storage cost rate, whch s d0. sze CostS. Meanwhle, the ncrease of the system cost rate s the sum of the generaton cost rate of d 0 tself, whch s gencost ( d0 ) d0. t, and the ncreased generaton cost rates of all the datasets n d 0.fSet caused by deletng d 0, whch s d d fset ( gencost d d t) 0. ( 0 ).. We compare d 0 s storage cost rate and generaton cost rate to decde whether d 0 should be stored or not. The detaled algorthm s shown n Fgure 6. Lemma: The deleton of stored ntermedate dataset d 0 n the IDG does not affect the stored datasets adjacent to d 0, where the stored datasets adjacent to d 0 means the datasets that drectly lnk to d 0 or d 0.pSet, and the datasets that are drectly lnked by d 0 or d 0.fSet. Proof: 1) Suppose d p s a stored dataset drectly lnked to d 0 or d 0.pSet. Snce d 0 s deleted, d 0 and d 0.fSet are added to d p.fset. So the new generaton cost rate of d p n the system gencost d p ) d p. t + d d fset d d. fset gencost ( d p ) d. t s ( U ( ) p. U 0 0, and t s larger than before, whch was 6

7 ( gencost ( d ) d t) ( d p ) d p. t + d d p fset p. Hence d p gencost.. stll needs to be stored; 2) Suppose d f s a stored dataset drectly lnked by d 0 or d 0.fSet. Snce d 0 s deleted, d 0 and d 0.pSet are added to d f.pset. So the new generaton cost of d f s gencost( d f ) = ( d f. t p + d d pset d d pset d t p ) CostC f. U 0U 0.., and t s larger than before, whch was ( d. t + d t ) CostC gencost( d f ) = f p d d f. pset. p. Because of the ncrease of gencost(d f ), the generaton cost rate of d f n the system s larger than before, whch was gencost ( d f ) d f. t + d d fset ( gencost d f d t) f. ( ).. Hence d f stll needs to be stored. Because of 1) and 2), the Lemma holds. Input: a stored ntermedate dataset d 0 ; an IDG ; Output: storage strategy of d 0 ; ( ) ; gencost( d0 ) = d.. //calculate d 0 s generaton cost 0. d t d0 t CostC d pset p + p f ( gencost( d0 ) d0. t + d ( gencost d d t) d sze CostS ) //compare d 0 s storage and generaton cost rate d fset ( ). > T ' = T + d 0. t θ ; //set the next checkng tme T, T s the current system tme, t s the duraton d 0 should be stored θ else d 0.flag= deleted ; //decde to delete d 0 d 0. CostR = gencost( d0 ) d0. t ; //change d 0 s cost rate for (every d n d 0.fSet ) //change the cost rates of all the datasets n d 0.fSet d. CostR = d. CostR + gencost( d0 ) d. t ; //cost rate ncreases wth the generaton cost of d 0 update IDG & execute store or delete of d 0 ; Fgure 6. Algorthm for checkng stored ntermedate datasets By applyng the algorthm of checkng the stored ntermedate datasets, we can stll guarantee that all the datasets we have kept n the system are necessary to be stored. Furthermore, when the deleted ntermedate datasets are regenerated, we also need to check whether to store or delete them as dscussed next. 4.3 Algorthm for decdng the regenerated ntermedate datasets storage status The IDG s a dynamc graph where the nformaton of new ntermedate datasets may jon at anytme. Although the algorthms n the above two sub-sectons can guarantee that the stored ntermedate datasets are all necessary, these stored datasets may not be the most cost effectve. Intally deleted ntermedate datasets may need to be stored as the IDG expands. Suppose d 0 s a regenerated ntermedate dataset n the system, whch has been deleted before. After been used, we have to recalculate d 0 s storage status, as well as the stored datasets adjacent to d 0 n the IDG. Theorem: If regenerated ntermedate dataset d 0 s stored, only the stored datasets adjacent to d 0 n the IDG may need to be deleted to reduce the system cost. Proof: 1) Suppose d p s a stored dataset drectly lnked to d 0 or d 0.pSet. Snce d 0 s stored, d 0 and d 0.fSet need to be removed from d p.fset. So the new generaton cost rate of d p n the system s gencost ( d p ) d p. t + d d fset d d fset ( gencost d p d t) p ( )., and t s smaller than before, whch was gencost ( d p ) d p. t + d d fset ( gencost d p d t) p. ( ).. If the new generaton cost rate s smaller than the storage cost rate of d p, d p would be deleted. The rest of the stored ntermedate datasets are not affected by the deleton of d p, because of the Lemma ntroduced before. 2) Suppose d f s a stored dataset drectly lnked by d 0 or d 0.fSet. Snce d 0 s stored, d 0 and d 0.pSet need to be removed from d f.pset. So the new generaton cost of d f s gencost( d f ) = ( d f. t p + d d pset d d pset d t p ) CostC f , and t s smaller than before, whch was gencost( d f ) = ( d f. t p + d d pset d t p ) CostC f... Because of the reduce of gencost(d f ), the generaton cost rate of d f n the system s smaller than before, whch was gencost ( d f ) d f. t + d d fset ( gencost d f d t) f. ( ).. If the new generaton cost rate s smaller than the storage cost rate of d f, d f would be deleted. The rest of the stored ntermedate datasets are not affected by the deleton of d f, because of the Lemma ntroduced before. Because of 1) and 2), the Theorem holds. If we store regenerated ntermedate dataset d 0, the cost rate of the system ncreases wth d 0 s storage cost rate, whch s d0. sze CostS. Meanwhle, the reducton of the system cost rate may be resulted from three aspects: (1) The generaton cost rate of d 0 tself, whch s gencost d ) d. t ( 0 0 ; (2) The reduced generaton cost rates of all the datasets n d 0.fSet caused by storng d 0, whch s d d fset ( gencost ( d ) d. t) ; 0. 0 (3) As ndcated n the Theorem, some stored datasets adjacent to d 0 may be deleted that reduces the cost to the system. 7

8 We wll compare the ncrease and reducton of the system cost rate to decde whether d 0 should be stored or not. The detaled algorthm s shown n Fgure 7. = ( gencost( d ) d. t) d. sze CostS; gencost( d ) d0. t d d0. fset 0 d. CostR = d. CostR gencost( d0) d. t ; ( gencost( d ) d. t) < d sze CostS gencost d ) d. t +. ( j j dm d j. fset d. CostR gencost( d ) d. t; j = j dm. CostR = dm. CostR + gencost( d j ) dm. t; + + = + d sze CostS gencost( d ) d. t gencost d ) d. t j j. j j d m d j. fset j + ( gencost( d ) d. t) < d. sze CostS ( k k dn dk. fset d. CostR gencost( d ) d. t; k = k k k m n j k ( gencost( d ) d. t); dn. CostR = dn. CostR + gencost( d j ) dn. t; = + dk. sze CostS gencost( dk ) dk. t d d. fset ( gencost( dk ) dn. t); n k > 0 j m Fgure 7. Algorthm for checkng deleted ntermedate datasets By applyng the algorthm of checkng the regenerated ntermedate datasets, we can not only guarantee that all the datasets we have kept n the system are necessary to be stored, but also any changes of the datasets storage status wll reduce the total system cost. V. EVALUATION 5.1 Smulaton envronment and strateges The ntermedate data storage strategy we proposed n ths paper s generc. It can be used n any scentfc workflow applcatons. In ths secton, we deploy t to the pulsar searchng workflow descrbed n Secton 2. We use the real world statstcs to conduct our smulaton on Swnburne hgh performance supercomputng faclty and demonstrate how our strategy works n storng the ntermedate datasets of the pulsar searchng workflow. To smulate the cloud computng envronment, we set up VMware software ( on the physcal servers and create vrtual clusters as the data centre. Furthermore, we set up the Hadoop fle system ( n the data centre to manage the applcaton data. In the pulsar example, durng the workflow executon, sx ntermedate datasets are generated. The IDG of ths pulsar searchng workflow s shown n Fgure 8, as well as the szes and generaton tmes of these ntermedate datasets. The generaton tmes of the datasets are from runnng ths workflow on Swnburne Supercomputer, and for smulaton, we assume that n the cloud system, the generaton tmes of these ntermedate datasets are the same. Furthermore, we assume that the prces of cloud servces follow Amazon s cost model,.e. $0.1 per CPU hour for computaton and $0.15 per ggabyte per month for storage. To evaluate the performance of our strategy, we run fve smulaton strateges together and compare the total cost of the system. The strateges are: 1) Store all the ntermedate datasets n the system; 2) Delete all the ntermedate datasets, and regenerate them whenever needed; 3) Store the datasets that have hgh generaton cost; 4) Store the datasets that are most often used; and 5) Our strategy to dynamcally decde whether a dataset should be stored or deleted. 8

9 Fgure 8. IDG of pulsar searchng workflow 5.2 Smulaton results We run the smulatons based on the estmated usage rate of every ntermedate dataset. From Swnburne astrophyscs research group, we understand that the dedsperson fles are the most useful ntermedate dataset. Based on these fles, many acceleratng and seekng methods can be used to search pulsar canddates. Hence, we set the de-dsperson fles to be used once every 4 days, and rest of the ntermedate datasets to be used once every 10 days. Based on ths settng, we run the above mentoned fve smulaton strateges and calculate the total costs of the system for ONE branch of the pulsar searchng workflow of processng ONE pece of observaton data n 50 days whch s shown n Fgure 9. Total cost of 50 days 50 Store all 40 Store none Cost ($) Days Store hgh generaton cost datasets Store often used datasets Dependency based strategy Fgure 9. Total cost of pulsar searchng workflow wth Amazon s cost model From Fgure 9 we can see that: 1) The cost of the store all strategy s a straght lne, because n ths strategy, all the ntermedate datasets are stored n the cloud storage that s charged at a fxed rate, and there s no computaton cost requred; 2) The cost of the store none strategy s a fluctuated lne because n ths strategy all the costs are computaton cost of regeneratng ntermedate datasets. For the days that have fewer requests of the data, the cost s low, otherwse, the cost s hgh; 3-5) For the remanng three strateges, the cost lnes are only a lttle fluctuated and the cost s much lower than the store all and store none strateges. Ths s because the ntermedate datasets are partally stored. As ndcated n Fgure 9 we can draw the concluson that: 1) Nether storng all the ntermedate datasets nor deletng them all s a cost-effectve way for ntermedate data storage; 2) Our dependency based strategy performs the most cost effectve to store the ntermedate datasets. Furthermore, back to the pulsar searchng workflow example, Table 1 shows how the fve strateges store the ntermedate datasets n detal. TABLE 1. PULSAR SEARCHING WORKFLOW S INTERMEDIATE DATASETS STORAGE STATUS IN 5 STRATEGIES Datasets Extracte De-dsperson Accelerated dedsperson fles canddates Pulsar Seek results Strateges d beam fles XML fles Store all Stored Stored Stored Stored Stored Stored Store none Deleted Deleted Deleted Deleted Deleted Deleted Store hgh generaton cost datasets Deleted Stored Stored Deleted Deleted Stored Store often used datasets Deleted Stored Deleted Deleted Deleted Deleted Dependency based strategy Deleted Stored (was deleted ntally) Deleted Stored Deleted Stored 9

10 Snce the ntermedate datasets of ths pulsar searchng workflow s not complcate, we can do some straghtforward analyses on how to store them. For the accelerated de-dsperson fles, although ts generaton cost s qute hgh, comparng to ts huge sze, t s not worth to store them n the cloud. However, n the strategy of store hgh generaton cost datasets, the accelerated de-dsperson fles are chosen to be stored. Furthermore, for the fnal XML fles, they are not very often used, but comparng to the hgh generaton cost and small sze, they should be stored. However, n the strategy of store often used datasets, these fles are not chosen to be stored. Generally speakng, our dependency based strategy s the most approprate strategy for the ntermedate data storage whch s also dynamc. From Table 1 we can see, our strategy dd not store the de-dsperson fles at begnnng, but stored them after ther regeneraton. In our strategy, every storage status change of the datasets would reduce the total system cost rate, where the strategy can gradually close the mnmum cost of system. One mportant factor that affects our dependency based strategy s the usage rate of the ntermedate datasets. In a system, f the usage rage of the ntermedate datasets s very hgh, the generaton cost of the datasets s very hgh, correspondngly these ntermedate datasets are more tend to be stored. On the contrary, n a very low ntermedate datasets usage rate system, all the datasets are tend to be deleted. In Fgure 9 s smulaton, we set the datasets usage rate on the borderlne that makes the total cost equvalent to the strateges of store all and store none. Under ths condton, the ntermedate datasets have no tendency to be stored or deleted, whch can objectvely demonstrate our strategy s effectveness on reducng the system cost. Next we wll also demonstrate the performance of our strategy n the stuatons under dfferent usage rates of the ntermedate datasets. Total cost of 50 days Store all Store none Store hgh generaton cost datasets Store often used datasets Dependency based strategy Cost ($) Cost ($) Days Days Fgure 10. Cost of pulsar searchng workflow wth dfferent ntermedate datasets usage rates Fgure 10 (a) shows the cost of the system wth the usage rate of every dataset doubled n the pulsar workflow. From the fgure we can see, when the datasets usage rates are hgh, the strategy of store none becomes hghly cost neffectve, because the frequent regeneraton of the ntermedate datasets causes a very hgh cost to the system. In contrast, our strategy s stll the most costeffectve one that the total system cost only ncreases slghtly. It s not very much nfluenced by the datasets usage rates. For the store all strategy, although t s not nfluenced by the usage rate, ts cost s stll very hgh. The rest two strateges are n the md range. They are nfluenced by the datasets usage rates more, and ther total costs are hgher than our strategy. Fgure 10 (b) shows the cost of the system wth the usage rate of every dataset halved n the pulsar workflow. From ths fgure we can see, n the system wth a low ntermedate datasets reuse rate, the store all strategy becomes hghly cost neffectve, and the store none strategy becomes relatvely cost effectve. Agan, our strategy s stll the most cost-effectve one among the fve strateges. From all the smulatons we have done on the pulsar searchng workflow, we fnd that depends on dfferent ntermedate datasets usage rates, our strategy can reduce the system cost by 46.3%-74.7% n comparson to the store all strategy; 45.2%-76.3% to the store none strategy; 23.9%-58.9% to the store hgh generaton cost datasets strategy; and 32.2%-54.7% store often used datasets strategy respectvely. Furthermore, to examne the generalty of our strategy, we have also conducted many smulatons on randomly generated workflows and ntermedate data. Due to the space lmt, we can not present them here. Based on the smulatons, we can reach the concluson that our ntermedate data storage strategy has a good performance. By automatcally selectng the valuable datasets to store, our strategy can sgnfcantly reduce the total cost of the pulsar searchng workflow. 10

11 VI. RELATED WORKS Comparng to the dstrbuted computng systems lke cluster and grd, a cloud computng system has a cost beneft [4]. Assunção et al. [5] demonstrate that cloud computng can extend the capacty of clusters wth a cost beneft. Usng Amazon clouds cost model and BOINC volunteer computng mddleware, the work n [16] analyses the cost beneft of cloud computng versus grd computng. The dea of dong scence on the cloud s not new. Scentfc applcatons have already been ntroduced to cloud computng systems. The Cumulus project [22] ntroduces a scentfc cloud archtecture for a data centre, and the Nmbus [15] toolkt can drectly turns a cluster nto a cloud whch has already been used to buld a cloud for scentfc applcatons. In terms of the cost beneft, the work by Deelman et al. [11] also apples Amazon clouds cost model and demonstrates that cloud computng offers a cost-effectve way to deploy scentfc applcatons. The above works manly focus on the comparson of cloud computng systems and the tradtonal dstrbuted computng paradgms, whch shows that applcatons runnng on cloud have cost benefts. However, our work studes how to reduce the cost f we run scentfc workflows on the cloud. In [11], Deelman et al. present that storng some popular ntermedate data can save the cost n comparson to always regeneratng them from the nput data. In [2], Adams et al. propose a model to represent the trade-off of computaton cost and storage cost, but have not gven the strategy to fnd ths trade-off. In our paper, an nnovatve ntermedate data storage strategy s developed to reduce the total cost of scentfc cloud workflow systems by fndng the trade-off of computaton cost and storage cost. Ths strategy can automatcally select the most approprate ntermedate data to store, not only based on the generaton cost and usage rate, but also the dependency of the workflow ntermedate data. The study of data provenance s mportant n our work. Due to the mportance of data provenance n scentfc applcatons, much research about recordng data provenance of the system has been done [13] [6]. Some of them are especally for scentfc workflow systems [6]. Some popular scentfc workflow systems, such as Kepler [17], have ther own system to record provenance durng the workflow executon [3]. In [20], Osterwel et al. present how to generate a Data Dervaton Graph (DDG) for the executon of a scentfc workflow, where one DDG records the data provenance of one executon. Smlar to the DDG, our IDG s also based on the scentfc workflow data provenance, but t depcts the dependency relatonshps of all the ntermedate data n the system. Wth the IDG, we know where the ntermedate data are derved from and how to regenerate them. VII. CONCLUSIONS AND FUTURE WORK In ths paper, based on an astrophyscs pulsar searchng workflow, we have examned the unque features of ntermedate data management n scentfc cloud workflow systems and developed a novel costeffectve strategy that can automatcally and dynamcally select the approprate ntermedate datasets of a scentfc workflow to store or delete n the cloud. The strategy can guarantee the stored ntermedate datasets n the system are all necessary, and can dynamcally check whether the regenerated datasets need to be stored, and f so, adjust the storage strategy accordngly. Smulaton results of utlsng ths strategy n the pulsar searchng workflow ndcate that our strategy can sgnfcantly reduce the total cost of the scentfc cloud workflow system. Our current work s based on Amazon s cloud cost model and assumed all the applcaton data are stored n ts cloud servce. However, sometmes scentfc workflows have to run dstrbuted, snce some applcaton data are dstrbuted and may have fxed locatons. In these cases, data transfer s nevtable. In the future, we wll develop some data placement strateges n order to reduce data transfer among data centres. Furthermore, to wder utlse our strategy, model of forecastng ntermedate data usage rate need to be studed. It must be flexble that can adapt n dfferent scentfc applcatons. ACKNOWLEDGEMENT The research work reported n ths paper s partly supported by Australan Research Councl under Lnkage Project LP We are also grateful for the dscussons wth Dr. W. van Straten and Ms. L. Levn from Swnburne Centre for Astrophyscs and Supercomputng on the pulsar searchng process. REFERENCE [1] "Amazon Elastc Computng Cloud, accessed on 28 Jan [2] I. Adams, D. D. E. Long, E. L. Mller, S. Pasupathy, and M. W. Storer, "Maxmzng Effcency By Tradng Storage for Computaton," n Workshop on Hot Topcs n Cloud Computng (HotCloud'09), pp. 1-5, [3] I. Altntas, O. Barney, and E. Jaeger-Frank, "Provenance Collecton Support n the Kepler Scentfc Workflow System," n Internatonal Provenance and Annotaton Workshop, pp , [4] M. Armbrust, A. Fox, R. Grffth, A. D. Joseph, R. H. Katz, A. Konwnsk, G. Lee, D. A. Patterson, A. Rabkn, I. Stoca, and M. Zahara, "Above the Clouds: A Berkeley Vew of Cloud Computng," Unversty of Calforna at Berkeley, Techncal Report UCB/EECS , accessed on 28 Jan [5] M. D. d. Assuncao, A. d. Costanzo, and R. Buyya, "Evaluatng the cost-beneft of usng cloud computng to extend the capacty of clusters," n 18th ACM Internatonal Symposum on Hgh Performance Dstrbuted Computng, Garchng, Germany, pp. 1-10, [6] Z. Bao, S. Cohen-Boulaka, S. B. Davdson, A. Eyal, and S. Khanna, "Dfferencng Provenance n Scentfc Workflows," n 25th IEEE Internatonal Conference on Data Engneerng, ICDE '09., pp , [7] R. Bose and J. Frew, "Lneage retreval for scentfc data processng: a survey," ACM Comput. Surv., vol. 37, pp. 1-28, [8] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandc, "Cloud computng and emergng IT platforms: Vson, hype, and realty for delverng computng as the 5th utlty," Future Generaton Computer Systems, vol. n press, pp. 1-18,

12 [9] E. Deelman, J. Blythe, Y. Gl, C. Kesselman, G. Mehta, S. Patl, M.-H. Su, K. Vah, and M. Lvny, "Pegasus: Mappng Scentfc Workflows onto the Grd," n European Across Grds Conference, pp , [10] E. Deelman and A. Chervenak, "Data Management Challenges of Data-Intensve Scentfc Workflows," n IEEE Internatonal Symposum on Cluster Computng and the Grd, pp , [11] E. Deelman, G. Sngh, M. Lvny, B. Berrman, and J. Good, "The Cost of Dong Scence on the Cloud: the Montage example," n ACM/IEEE Conference on Supercomputng, Austn, Texas, pp. 1-12, [12] I. Foster, Z. Yong, I. Racu, and S. Lu, "Cloud Computng and Grd Computng 360-Degree Compared," n Grd Computng Envronments Workshop, GCE '08, pp. 1-10, [13] P. Groth and L. Moreau, "Recordng Process Documentaton for Provenance," IEEE Transactons on Parallel and Dstrbuted Systems, vol. 20, pp , [14] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berrman, and J. Good, "On the Use of Cloud Computng for Scentfc Workflows," n 4th IEEE Internatonal Conference on e-scence, pp , [15] K. Keahey, R. Fgueredo, J. Fortes, T. Freeman, and M. Tsugawa, "Scence Clouds: Early Experences n Cloud Computng for Scentfc Applcatons," n Frst Workshop on Cloud Computng and ts Applcatons (CCA'08), pp. 1-6, [16] D. Kondo, B. Javad, P. Malecot, F. Cappello, and D. P. Anderson, "Cost-beneft analyss of Cloud Computng versus desktop grds," n IEEE Internatonal Symposum on Parallel & Dstrbuted Processng, IPDPS'09, pp. 1-12, [17] B. Ludascher, I. Altntas, C. Berkley, D. Hggns, E. Jaeger, M. Jones, and E. A. Lee, "Scentfc workflow management and the Kepler system," Concurrency and Computaton: Practce and Experence, pp , [18] C. Morett, J. Bulosan, D. Than, and P. J. Flynn, "All-Pars: An Abstracton for Data-Intensve Cloud Computng," n IEEE Internatonal Parallel & Dstrbuted Processng Symposum, IPDPS'08, pp. 1-11, [19] T. Onn, M. Adds, J. Ferrs, D. Marvn, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wpat, and P. L, "Taverna: A tool for the composton and enactment of bonformatcs workflows," Bonformatcs, vol. 20, pp , [20] L. J. Osterwel, L. A. Clarke, A. M. Ellson, R. Podorozhny, A. Wse, E. Boose, and J. Hadley, "Experence n Usng A Process Language to Defne Scentfc Workflow and Generate Dataset Provenance," n 16th ACM SIGSOFT Internatonal Symposum on Foundatons of Software Engneerng, Atlanta, Georga, pp , [21] Y. L. Smmhan, B. Plale, and D. Gannon, "A survey of data provenance n e-scence," SIGMOD Rec., vol. 34, pp , [22] L. Wang, J. Tao, M. Kunze, A. C. Castellanos, D. Kramer, and W. Karl, "Scentfc Cloud Computng: Early Defnton and Experence," n 10th IEEE Internatonal Conference on Hgh Performance Computng and Communcatons, HPCC '08., pp , [23] A. Wess, "Computng n the Cloud," ACM Networker, vol. 11, pp ,

A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems *

A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems * A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems * Dong Yuan, Yun Yang, Xiao Liu, Gaofeng Zhang, Jinjun Chen Faculty of Information and Communication

More information

A data dependency based strategy for intermediate data storage in scientific cloud workflow systems

A data dependency based strategy for intermediate data storage in scientific cloud workflow systems CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2010) Published online in Wiley Online Library (wileyonlinelibrary.com)..1636 A data dependency based strategy

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Cost-based Scheduling of Scientific Workflow Applications on Utility Grids

Cost-based Scheduling of Scientific Workflow Applications on Utility Grids Cost-based Schedulng of Scentfc Workflow Applcatons on Utlty Grds Ja Yu, Rakumar Buyya and Chen Khong Tham Grd Computng and Dstrbuted Systems Laboratory Dept. of Computer Scence and Software Engneerng

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

QoS-based Scheduling of Workflow Applications on Service Grids

QoS-based Scheduling of Workflow Applications on Service Grids QoS-based Schedulng of Workflow Applcatons on Servce Grds Ja Yu, Rakumar Buyya and Chen Khong Tham Grd Computng and Dstrbuted System Laboratory Dept. of Computer Scence and Software Engneerng The Unversty

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Lecture 3: Force of Interest, Real Interest Rate, Annuity Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

Power Consumption Optimization Strategy of Cloud Workflow. Scheduling Based on SLA

Power Consumption Optimization Strategy of Cloud Workflow. Scheduling Based on SLA Power Consumpton Optmzaton Strategy of Cloud Workflow Schedulng Based on SLA YONGHONG LUO, SHUREN ZHOU School of Computer and Communcaton Engneerng Changsha Unversty of Scence and Technology 960, 2nd Secton,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Pricing Model of Cloud Computing Service with Partial Multihoming

Pricing Model of Cloud Computing Service with Partial Multihoming Prcng Model of Cloud Computng Servce wth Partal Multhomng Zhang Ru 1 Tang Bng-yong 1 1.Glorous Sun School of Busness and Managment Donghua Unversty Shangha 251 Chna E-mal:ru528369@mal.dhu.edu.cn Abstract

More information

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

A Dynamic Energy-Efficiency Mechanism for Data Center Networks A Dynamc Energy-Effcency Mechansm for Data Center Networks Sun Lang, Zhang Jnfang, Huang Daochao, Yang Dong, Qn Yajuan A Dynamc Energy-Effcency Mechansm for Data Center Networks 1 Sun Lang, 1 Zhang Jnfang,

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers Far Vrtual Bandwdth Allocaton Model n Vrtual Data Centers Yng Yuan, Cu-rong Wang, Cong Wang School of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna School of Computer and Communcaton

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Dynamic Pricing for Smart Grid with Reinforcement Learning

Dynamic Pricing for Smart Grid with Reinforcement Learning Dynamc Prcng for Smart Grd wth Renforcement Learnng Byung-Gook Km, Yu Zhang, Mhaela van der Schaar, and Jang-Won Lee Samsung Electroncs, Suwon, Korea Department of Electrcal Engneerng, UCLA, Los Angeles,

More information

Resource Management and Organization in CROWN Grid

Resource Management and Organization in CROWN Grid Resource Management and Organzaton n CROWN Grd Jnpeng Hua, Tanyu Wo, Yunhao Lu Dept. of Computer Scence and Technology, Behang Unversty Dept. of Computer Scence, Hong Kong Unversty of Scence & Technology

More information

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid Feasblty of Usng Dscrmnate Prcng Schemes for Energy Tradng n Smart Grd Wayes Tushar, Chau Yuen, Bo Cha, Davd B. Smth, and H. Vncent Poor Sngapore Unversty of Technology and Desgn, Sngapore 138682. Emal:

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

Canon NTSC Help Desk Documentation

Canon NTSC Help Desk Documentation Canon NTSC Help Desk Documentaton READ THIS BEFORE PROCEEDING Before revewng ths documentaton, Canon Busness Solutons, Inc. ( CBS ) hereby refers you, the customer or customer s representatve or agent

More information

Price Competition in an Oligopoly Market with Multiple IaaS Cloud Providers

Price Competition in an Oligopoly Market with Multiple IaaS Cloud Providers Prce Competton n an Olgopoly Market wth Multple IaaS Cloud Provders Yuan Feng, Baochun L, Bo L Department of Computng, Hong Kong Polytechnc Unversty Department of Electrcal and Computer Engneerng, Unversty

More information

Minimal Cost Data Sets Storage in the Cloud

Minimal Cost Data Sets Storage in the Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1091

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment Advanced Scence and Technology Letters, pp.74-79 http://dx.do.org/10.14257/astl.2014.50.12 Genetc Algorthm Based Optmzaton Model for Relable Data Storage n Cloud Envronment Feng Lu 1,2,3, Hatao Wu 1,3,

More information

Network Security Situation Evaluation Method for Distributed Denial of Service

Network Security Situation Evaluation Method for Distributed Denial of Service Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers J. Parallel Dstrb. Comput. 71 (2011) 732 749 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. ournal homepage: www.elsever.com/locate/pdc Envronment-conscous schedulng of HPC applcatons

More information

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals Automated nformaton technology for onosphere montorng of low-orbt navgaton satellte sgnals Alexander Romanov, Sergey Trusov and Alexey Romanov Federal State Untary Enterprse Russan Insttute of Space Devce

More information

A heuristic task deployment approach for load balancing

A heuristic task deployment approach for load balancing Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Portfolo and Grdiversion Technology

Portfolo and Grdiversion Technology Dstrbuted Portfolo and Investment Rs Analyss on Global Grds Rafael Moreno-Vozmedano, Krshna Nadmnt, Srumar Venugopal, Ana B. Alonso-Conde 3, Hussen Gbbns, and Rajumar Buyya Grd Computng and Dstrbuted Systems

More information

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment An Optmal Model for Prorty based Servce Schedulng Polcy for Cloud Computng Envronment Dr. M. Dakshayn Dept. of ISE, BMS College of Engneerng, Bangalore, Inda. Dr. H. S. Guruprasad Dept. of ISE, BMS College

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm Internatonal Journal of Grd Dstrbuton Computng, pp.175-190 http://dx.do.org/10.14257/gdc.2014.7.6.14 Optmzaton odel of Relable Data Storage n Cloud Envronment Usng Genetc Algorthm Feng Lu 1,2,3, Hatao

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Research of Network System Reconfigurable Model Based on the Finite State Automation

Research of Network System Reconfigurable Model Based on the Finite State Automation JOURNAL OF NETWORKS, VOL., NO. 5, MAY 24 237 Research of Network System Reconfgurable Model Based on the Fnte State Automaton Shenghan Zhou and Wenbng Chang School of Relablty and System Engneerng, Behang

More information

The Load Balancing of Database Allocation in the Cloud

The Load Balancing of Database Allocation in the Cloud , March 3-5, 23, Hong Kong The Load Balancng of Database Allocaton n the Cloud Yu-lung Lo and Mn-Shan La Abstract Each database host n the cloud platform often has to servce more than one database applcaton

More information

Resource Scheduling in Desktop Grid by Grid-JQA

Resource Scheduling in Desktop Grid by Grid-JQA The 3rd Internatonal Conference on Grd and Pervasve Computng - Worshops esource Schedulng n Destop Grd by Grd-JQA L. Mohammad Khanl M. Analou Assstant professor Assstant professor C.S. Dept.Tabrz Unversty

More information

A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks

A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks A Resource-tradng Mechansm for Effcent Dstrbuton of Large-volume Contents on Peer-to-Peer Networks SmonG.M.Koo,C.S.GeorgeLee, Karthk Kannan School of Electrcal and Computer Engneerng Krannet School of

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Dynamic Fleet Management for Cybercars

Dynamic Fleet Management for Cybercars Proceedngs of the IEEE ITSC 2006 2006 IEEE Intellgent Transportaton Systems Conference Toronto, Canada, September 17-20, 2006 TC7.5 Dynamc Fleet Management for Cybercars Fenghu. Wang, Mng. Yang, Ruqng.

More information

Complex Service Provisioning in Collaborative Cloud Markets

Complex Service Provisioning in Collaborative Cloud Markets Melane Sebenhaar, Ulrch Lampe, Tm Lehrg, Sebastan Zöller, Stefan Schulte, Ralf Stenmetz: Complex Servce Provsonng n Collaboratve Cloud Markets. In: W. Abramowcz et al. (Eds.): Proceedngs of the 4th European

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Set. algorithms based. 1. Introduction. System Diagram. based. Exploration. 2. Index

Set. algorithms based. 1. Introduction. System Diagram. based. Exploration. 2. Index ISSN (Prnt): 1694-0784 ISSN (Onlne): 1694-0814 www.ijcsi.org 236 IT outsourcng servce provder dynamc evaluaton model and algorthms based on Rough Set L Sh Sh 1,2 1 Internatonal School of Software, Wuhan

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika. VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel Load-Balancng Method for Soluton-Adaptve Unstructured Fnte Element

More information

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer

More information

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines DBA-VM: Dynamc Bandwdth Allocator for Vrtual Machnes Ahmed Amamou, Manel Bourguba, Kamel Haddadou and Guy Pujolle LIP6, Perre & Mare Cure Unversty, 4 Place Jusseu 755 Pars, France Gand SAS, 65 Boulevard

More information

An Ad Hoc Network Load Balancing Energy- Efficient Multipath Routing Protocol

An Ad Hoc Network Load Balancing Energy- Efficient Multipath Routing Protocol 246 JOURNA OF SOFTWAR, VO. 9, NO. 1, JANUARY 2014 An Ad Hoc Network oad alancng nergy- ffcent Multpath Routng Protocol De-jn Kong Shanx Fnance and Taxaton College, Tayuan, Chna mal: dejnkong@163.com Xao-lng

More information