A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning

Size: px
Start display at page:

Download "A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning"

Transcription

1 A Scalable Data Scence Workflow Approach for Bg Data Bayesan Network Learnng Janwu Wang 1, Yan Tang 2, Ma Nguyen 1, Ilkay Altntas 1 1 San Dego Supercomputer Center Unversty of Calforna, San Dego La Jolla, CA, USA, {janwu, mhnguyen, 2 College of Computer and Informaton Hoha Unversty Nanjng, Chna, Abstract In the Bg Data era, machne learnng has more potental to dscover valuable nsghts from the data. As an mportant machne learnng technque, Bayesan Network (BN) has been wdely used to model probablstc relatonshps among varables. To deal wth the challenges of Bg Data PN learnng, we apply the technques n dstrbuted data-parallelsm (DDP) and scentfc workflow to the BN learnng process. We frst propose an ntellgent Bg Data pre-processng approach and a data qualty score to measure and ensure the data qualty and data fathfulness. Then, a new weght based ensemble algorthm s proposed to learn a BN structure from an ensemble of local results. To easly ntegrate the algorthm wth DDP engnes, such as Hadoop, we employ Kepler scentfc workflow to buld the whole learnng process. We demonstrate how Kepler can facltate buldng and runnng our Bg Data BN learnng applcaton. Our experments show good scalablty and learnng accuracy when runnng the applcaton n real dstrbuted envronments. Keywords Bg Data; Bayesan network; Dstrbuted computng; Ensemble learnng; Scentfc workflow; Kepler; Hadoop I. INTRODUCTION Wth the explosve growth of data encountered n our daly lves, we have entered the Bg Data era [1]. For the Unted States health care sector alone, the creatve and effectve use of Bg Data could brng more than $300 bllon potental annual value each year [35]. Makng the best use of Bg Data and releasng ts value are the core problems that researchers all over the world long to solve snce the New Mllennum. Bayesan Network (BN), a probablstc graph model, provdes ntutve and theoretcally sold mechansms for processng uncertan nformaton and presentng causaltes among varables. BN s an deal tool for causal relatonshp modelng and probablstc reasonng. BN s a wdely used n modelng [2][3][4], predcton [5][6][7], and rsk analyss [8]. BNs have been appled n a wde range of domans such as Health Care, Educaton, Fnance, Envronment, Bonformatcs, Telecommuncaton, and Informaton Technology [9]. Wth abundant data resources nowadays, learnng BN from Bg Data could dscover valuable busness nsghts [4] and brng potental revenue value [8] to dfferent domans. To effcently process large quanttes of data, a scalable approach s needed. Although dstrbuted data-parallelsm (DDP) patterns, such as Map, Reduce, Match, CoGroup and Cross, are promsng technques to buld scalable data parallel analyss and analytcs applcatons, applyng the DDP patterns n Bg Data BN learnng stll faces several challenges: (1) How can we effectvely pre-process Bg Data to evaluate ts qualty and reduce the sze f necessary? (2) How to desgn a workflow capable of takng Ggabytes of bg data sets and learn BNs wth decent accuracy? (3) How to provde easy scalablty support to BN learnng algorthms? These three questons have not receved substantal attenton n the current research status-quo. Ths s the man motvaton for ths research: the creaton of the novel workflow - Scalable Bayesan Network Learnng (SBNL) workflow. Ths SBNL workflow has three research noveltes whch contrbute to the current lterature: Intellgent Bg Data pre-processng through the use of a proposed data qualty score called S to measure and Arc ensure data qualty and data fathfulness. Effectve BN learnng from Bg Data by leveragng ensemble learnng and dstrbuted computng model. A new weght based ensemble algorthm s proposed to learn a BN structure from an ensemble of local results. Ths algorthm s mplemented as an R package and reusable by thrd partes. A user-frendly approach to buld and run scalable Bg Data machne learnng applcatons on top of DDP patterns and engnes va scentfc workflows. Users do not need to wrte programs to ft the nterfaces of dfferent DDP engnes. They only need to buld algorthm specfc components usng languages lke R or Matlab, whch they are already famlar wth. SBNL s valdated usng three publcly avalable data sets. SBNL obtans sgnfcant performance gan when appled to dstrbuted envronments whle keepng the same learnng accuracy, makng SBNL an deal workflow for Bg Data Bayesan Network learnng. The remander of ths paper s organzed as follows. Secton II presents the background of BN learnng technques and

2 evaluaton. Secton III dscusses how to buld scalable Bg Data applcatons va Kepler scentfc workflow system. Secton IV descrbes SBNL workflow n detal. The evaluaton results and related work are presented n Secton V and VI, respectvely. Secton VII concludes ths paper wth future work. II. BAYESIAN NETWORK LEARNING TECHNIQUES AND EVALUATION A. Ensemble Learnng One trend n machne learnng s to combne results of multple learners to obtan better accuracy. Ths trend s commonly known as Ensemble Learnng. Ensemble Learnng leverages multple models to obtan better predctve performance than what could be obtaned from any of the consttuent models [19]. There s no defntve taxonomy for ensemble learnng. Zenko [18] detals four methods of combnng multple models: baggng, boostng, stackng and error-correctng output. In 2009, the Netflx challenge grand przewnner used ensemble learnng technque to buld the most accurate predctve model for move recommendaton and won a mllon dollar 1. In ths paper, we propose a weght-based algorthm to combne local and global learnng results to learn a BN from Bg Data. Man symbols used n the paper are lsted n Table I. Data set B D P(B,D) P(B) TABLE I. SYMBOL TABLE BN structure Data set Meanngs The jont probablty of a BN structure (B) gven the data set (D) Pror probablty of the network B N Pror equvalent sample sze X Pa(X ) Γ Varable X Parents of X Gamma functon N D The number of rows n D, N arc S Arc The total number of arcs n B Arc Score B. The BDe Score Score functons descrbe how well the BN structure fts the data set. The BDe score functon [9] provdes a way to compute the jont probablty of a BN structure (B) gven the data set (D) denoted as PBD (, ). The best BN structure maxmzes the BDe score. Thus BDe score s a good measure for data set qualty. The BDe score functon s derved from dscrete varables. Let n be the total number of varables, q denotes the number of confguratons of the parent set Pa( X ) and r denotes the number of states of varable X. The BDe score functon s: P( B, D) = P( B ) n. ' = 1 ' N Γ( ) q N Γ( + N ) q r j= 1 k= 1 j q ' N Γ( + N r q ' N Γ( ) r q where N jk s the number of nstances n D for whch X = k and Pa( X ) s n the j th confguraton r = N j N jk k= The BDe score functon uses a parameter pror that s unform and requres both a pror equvalent sample sze N and a pror structure. In practce, the BDe score functon uses the value log( PBD (, )). BDe score functon s decomposable: the total score s the sum of the score of each varable X. Ths study uses BDe score to evaluate the learnng accuracy of BN learnng algorthm as well as the data set qualty (Secton V.B) In order to measure the qualty of a data set D, we ntroduce a new measure called Arc Score denoted as S Arc jk ) (1) (2) S = P( B, D)/( N * N ) (3) Arc D arc P(B, D) s the BDe score gven data D,and BN structure B, N s the number of rows n D, D N s the total number of arc arcs n B. After emprcal study on known BN structures (Secton V), we dscover that the data sets sutable for BN learnng have S Arc larger than On the other hand, data sets that produce nferor BN structure have very low S Arc, generally smaller than -0.5, reachng -1 or even -2. Hence, S Arc could be used as an effectve measure for data set qualty. Under the stuaton where we only have a data set D, how could we obtan S Arc? To address ths problem, we can use an accurate BN learnng algorthm lke Max-Mn Hll Clmbng (MMHC) [10]. Because BN learnng algorthms are neutral, gven good data set, a structure B wth hgh S Arc wll be learned, and gven a bad data set D, a structure B wth low S Arc wll be learned. Therefore, we can use MMHC, one of the most accurate and robust algorthms from the BN learnng algorthm comparson study [9] to learn a BN from any data set D and calculate the correspondng S Arc. C. Structural Hammng Dstance Structure Hammng Dstance (SHD) [14] s an mportant measure to evaluate the qualty of the BN. The learned BN structure B s drectly compared wth the structure of a gold standard network (GSN) - the network wth a known correct structure. Each arc n the learned structure fts one of the followng cases: Correct Arc: the same arc as the GSN. Added Arc (AA): an arc that does not exst n the GSN. Mssng Arc (MA): an arc that exsts n the GSN but not n the learned structure. 1 Netflx prze, Netflx Inc,

3 Wrongly Drected Arc (WDA): an arc that exsts n the GSN wth opposte drecton. Snce some algorthms may return non-orented edges as the drecton of some edges could not be statstcally dstngushed usng orentaton rules, the learned BN s a partally drected acyclc graph (PDAG). SHD s defned as the number of the operators requred to make two PDAGs dentcal: ether by addng, removng, or reversng an arc, or by addng, removng or orentng an edge. SHD = (#AA + #MA + #WDA +#AE + #ME + #NOE) (4) where AE s Added Edges, ME s Mssng Edges and NOE s Non-Orented Edges. D. Bayesan Network Learnng Algorthm A comprehensve and comparatve survey was carred out on BN learnng algorthms [9]. Out of more than 50 learnng algorthms, Max-Mn-Hll-Clmbng (MMHC) [10], Three- Phase Dependency Analyss Algorthm (TPDA) [11] and Recursve BN learnng Algorthm (REC) [12] are shown to have superor learnng accuracy and robustness. Input: D : Data set ε : threshold for condtonal ndependence test MMHC ( D, ε ) Phase I: 1. For all varables X, Set PC( X) = MMPC( X, D) ; Phase II: 2. Start from an empty graph; perform greedy hll-clmbng wth operators: add_edge, delete_edge and reverse_edge. Only try operator add_edge Y X f Y PC( X ) ; Return the hghest scorng DAG found; Fg. 1. The MMHC algorthm. The Max-Mn Hll Clmbng (MMHC) algorthm [14] combnes concepts from Constrant-based [13], and Searchand-Score-Based algorthms [11]. It takes as nput a data set D and returns a BN structure wth the hghest score. MMHC s a two-phase algorthm: Phase I dentfes the canddate sets for each varable X by callng a local-dscovery algorthm called Max-Mn Parents and Chldren (MMPC) and dscover a BN s skeleton. Phase II performs a Bayesan-scorng, greedy hllclmbng search startng from an empty graph to orent and delete the edges. Fg. 1 presents a detaled descrpton of the MMHC algorthm. The major structural search process of MMHC algorthm s the MMPC procedure that returns the parents and chldren of a target varable X, denoted as PC(X). By nvokng MMPC wth each varable as the target, one can dentfy all the edges that form the BBN s skeleton. III. BUILDING SCALABLE BIG DATA APPLICATIONS VIA KEPLER SCIENTIFIC WORKFLOW SYSTEM A. Dstrbuted Data-Parallel Patterns for Scalable Bg Data Applcaton Several DDP patterns, such as Map, Reduce, Match, CoGroup, and Cross, have been dentfed to easly buld effcent and scalable data parallel analyss and analytcs applcatons [27]. DDP patterns enable programs to execute n parallel by splttng data n dstrbuted computng envronments. Orgnatng from hgher-order functonal programmng, each DDP pattern executes user-defned functons (UDF) n parallel over nput data sets. Snce DDP executon engnes often provde many features for executon, ncludng parallelzaton, communcaton, and fault tolerance, applcaton developers only need to select the approprate DDP pattern for ther specfc data processng tasks, and mplement the correspondng UDFs. Due to the ncreasng popularty and adopton of these DDP patterns, a number of executon engnes have been mplemented to support one or more of them. These DDP executon engnes manage dstrbuted resources, and execute UDF nstances n parallel. When runnng on dstrbuted resources, DDP engnes can acheve good scalablty and performance acceleraton. Hadoop s the most popular MapReduce executon engne. The Stratosphere system [27] supports fve dfferent DDP patterns. Many of the above DDP patterns are also supported by Spark 2. Snce each DDP executon engne defnes ts own API for how UDFs should be mplemented, an applcaton mplemented for one engne may be dffcult to run on another engne. B. Kepler Scentfc Workflow The Kepler scentfc workflow system 3 s an open-source, cross-project collaboraton to serve scentsts from dfferent dscplnes [28]. Kepler adopts an actor-orented modelng paradgm for the desgn and executon of scentfc workflows. Kepler has been used n a wde varety of projects to manage, process, and analyze scentfc data. Kepler provdes a graphcal user nterface (GUI) for desgnng, managng and executng scentfc workflows, whch are a structured set of steps or tasks lnked together that mplement a computatonal soluton to a scentfc problem. In Kepler, Actors provde mplementatons of specfc tasks and can be lnked together va nput and output Ports. Data s encapsulated n messages or Tokens, and transferred between actors through ports. Actor executon s governed by Model of Computatons (MoCs), called Drectors n Kepler [29]. We found the actor-orented programmng paradgm of Kepler fts the DDP framework very well [30]. Snce each DDP pattern expresses an ndependent hgher-order functon, we defne a separate DDP actor for each pattern. Unlke normal actors, these hgher-order DDP actors do not process 2 Spark Project: 3 Kepler Project: https://kepler-project.org/

4 ts nput data as a whole. Instead, they frst partton the nput data and then process each partton separately. The UDF for the DDP patterns s an ndependent component and can naturally be encapsulated wthn a DDP actor. The logc of the UDF can ether be expressed as a subworkflow or compled code. In the frst case, users can compose a sub-workflow for ther UDF va Kepler GUI usng specfc subsdary actors for the DDP pattern and any other general actors. Snce the sub-workflow s not specfc to any engne API, the same sub-workflow could be executed on dfferent DDP engnes. Lke other actors, multple DDP actors can be lnked to construct bgger applcatons. Each DDP pattern defnes ts executon semantcs,.e., how data parttons are processed by the pattern. Ths clear defnton enables decouplng between a DDP pattern and ts executon engnes. To execute DDP workflows on dfferent DDP executon engnes, we have mplemented a DDP drector n Kepler. Currently, ths drector can execute DDP workflows wth Hadoop, Stratosphere and Spark. At runtme, the drector wll detect the avalablty of DDP executon engnes and transform workflows nto ther correspondng jobs. The adaptablty of the drector makes t user-frendly snce t hdes the underlyng executon engnes from users. C. Machne Learnng Support n Kepler There are many popular tools/languages for machne learnng, such as R, Matlab, Python and Knme [31]. Complex machne learnng applcatons mght need to ntegrate dfferent components mplemented n dfferent tools/languages. Kepler supports easy ntegraton of these tools/languages wthn one process. Besdes the ExternalExecuton actor n Kepler to nvoke arbtrary bnary tools n batch mode, we also have actors specfcally for many scrptng languages. For nstance, users can embed ther own R scrpts n the RExpresson actor. Users can further customze the nput/output ports of the RExpresson actor to connect wth other actors and buld complex applcatons. In addton, we are nvestgatng how to ntegrate other popular machne learnng tools, such as Mahout 4, nto Kepler. Users wll be able to use ther machne learnng functons/lbrares as actors and connect them wth other actors. IV. PROPOSED APPROACH A. Overvew of SBNL Workflow After ntroducng the background knowledge n prevous sectons, we gve the overvew of our SBNL workflow. 4 Mahout: https://mahout.apache.org/ Bg Data Data Qualty Evaluaton MasterEnsemble Learnng SBNL workflow Qualty Evaluaton & Data Parttonng Local Learner Master Learner Local Ensemble Learnng Fnal BN Structure Kepler Workflow Fg. 2. Overvew of the SBNL algorthm. As shown n Fg. 2, SBNL workflow conssts of four components: (1) Data parttonng, (2) Local learner, (3) Master learner, (4) Kepler workflow. In the data parttonng component, the SBNL workflow parttons the data set nto data parttons of reasonable sze. SBNL has a score based algorthm to dynamcally determne the best partton sze to balance both learnng complexty and accuracy. Then, data parttons are sent evenly to each local learner. The local learner wll frst use the value of S Arc to examne the data partton s qualty. If the qualty s good, SBNL then enters local ensemble learnng (LEL) step, each local learner wll run MMHC algorthm separately on each local data partton to learn an ndvdual BN. Then, local learner apples our proposed ensemble method on ndvdual BNs to generate a fnal local BN. Durng local learnng, the best local data partton s obtaned n each local learner. Fnally, SBNL workflow reaches the master learner component. Ths component receves local BN and best local data parttons from all local learners. The best data partton can be obtaned n master learner. Then, master learner runs our proposed ensemble algorthm on the local BN usng the best data partton. Note that master learner does not run any BN learnng algorthm, t just gves weght to each local BN and ensembles the fnal BN. So, all the computng heavy lftng tasks are dstrbuted among the local learners. Detals of each component n SBNL workflow are specfed n the followng sub-sectons. B. Qualty Evaluaton and Data Parttonng Frst thng SBNL workflow does s evaluatng the qualty of a gven bg data set. It runs a scorng algorthm to ncrementally evaluate a partton D p of the whole data set D. Each tme the scorng algorthm doubles the sze of D p untl a threshold s reached. If the S Arc value of D p s larger than 1.0, then the whole data set wll not be used for SBNL workflow.

5 Facng Bg Data larger than the memory sze, a sngle machne could not compute the Bayesan score of the whole data set. Therefore, we use dstrbuted computng model n SBNL workflow. A bg data set s parttoned nto K slces of sze N s. N d = N s *K + N r (5) where N d s the sze of data set D and N r s the row number of the remander of D after K parttons. By countng the last N r rows data as another slce, total partton slce s K+1. Gven the total number of local learners denoted as N local, we try to send data slces evenly to the local learners for better load balance. An mportant task here s to determne N s, we propose a fast ncremental algorthm FndNs to fnd a proper partton sze. FndNs s descrbed n Table II. TABLE II. THE FINDNS FUNCTION functon FndNs (data, maxstep, maxsze ){ bestscore = 100; currentstep = 1; n rowdata= number of rows n data; n coldata = number of column n data; Ns = 1000*( n coldata % 10); slceddata = data[1:slcesze] ; score = dataqualtycalculator(slceddata) *(-1); whle (score < bestscore && currentstep < maxstep && slcesze < maxsze) { bestscore = score; Ns= Ns*2; slceddata = data[1: Ns]; score = SarcCalculator (slceddata) *(-1); currentstep = currentstep+1; return Ns; functon SarcCalculator(D, parameters){ network = mmhc(d, parameters); score = score(network, D, type ="BDe"); S Arc = score/(n d * (# of arcs n network)); return S Arc; FndNs algorthm begns wth the ntal slce sze: N s =1000 *(n coldata %10) (7) Then t doubles the value of N s teratvely and evaluate the data partton (data[1: N s ]) untl ts S Arc value could no longer be mproved or the maxmum number teraton or partton sze s reached. In ths way, the qualty of each data partton s ensured by S Arc and the data partton sze s controlled under a threshold. C. Local Learner Frst actvty n local learner s the Data Qualty Evaluaton (DQE). Durng DQE, each data partton s examned wth the functon SarcCalculator. If the data partton s S Arc s less than -0.5, then ths partton s dropped by SBNL workflow. After DQE, local learner then enters the second actvty: Local Ensemble Learnng (LEL) shown n Table III. In LEL, the frst step s learnng local BNs from data parttons usng MMHC algorthm. Ths step also looks at each data partton and selects the best partton. Then, LEL calculates learnscores for local BNs usng the best data partton. TABLE III. LOCAL ENSEMBLE LEARNING LocalEnsembleLearnng(dataParttons){ # Intalzaton localbns = learn BNs from dataparttons usng MMHC; learnscores = Bde scores of localbns usng best data partton; fnallocalbn = ensemblebns(localbns,learnscores, bestpartton); ensemblebns(localbns, learnscores, bestpartton){ weghts = weghtcalculator(learnscores, bestpartton); mergedmatrx = matrx(0, nnodes,nnodes); # Transform and merge local BNs for ( n n 1:length(localBNs)) { adjmatrx = BNToAdjMatrx(localBNs[n]); mergedmatrx = adjmatrx * weghts[n] + mergedmatrx; # Transform merged matrx nto fnal local BN mnthreshold = mn(weghts); fnallocalbn = MergedMatrxToBN(mergedMatrx, mnthreshold*2); return fnallocalbn; Assgnng weght to each ndvdual learner s an mportant technque n ensemble learnng. LEL leverages the weghtng technque and proposes a method called ensemblebns. In ensemblebns, based on the value of localscores, a weght vector s calculated. For example, f the localscores = [-0.2, - 0.3, -0.25, -0.25], then the correspondng normalzed weght vector s [0.306, 0.204, 0.245, 0.245]. Smaller local score n absolute value has hgher weght. After obtanng the weghts, ensemblebns then transforms local BNs nto adjacency matrxes and merges them nto one matrx usng the weght vector. In the end, ensemblebns leverages the merged matrx to generate the fnal local BN. A threshold s set as the mnmal value n the weght vector. Then ensemblebns terates the merged matrx and dentfy an arc when mergedmatrx [, j] > mnthreshold * 2. Ths s a votng mechansm to promote and dscover an arc when t s present n more than two local BNs. D. Master Learner After descrbng local learner, we now ntroduce the fnal component n SBNL workflow the master learner. Master learner adopts smlar strategy as the local learner and reuses the functon ensemblebns. There are two nputs: fnal local BNs deoted as S local, and the best local partton denoted as D localbest. Master learner contans four steps: 1) Obtan the global best partton from D localbest. ; 2) Calculate scores for S local usng D best ; 3) Call ensemblebns functon to obtan the fnal BN; 4) Return fnal BN as the learnng result of SBNL workflow from Bg Data.

6 TABLE IV. MASTER LEARNER CentralEnsembleLearner(BN local, D localbest) { obtan the best data partton D best from D localbest scores = Bde scores of S local usng D best; fnalbn= ensemblebns(bn local, scores, D best); return fnalbn; E. SBNL Workflow n Kepler We buld our SBNL workflow by embeddng the above components n Kepler, whch s shown n Fg. 3. All the code snppets (namely Table II, III, IV) are mplemented n an R package as the core of the Kepler bg data BN learnng workflow. Man actors of the top-level workflow, shown n Fg. 3 (a), are ParttonData and DDPNetworkLearner actors. The frst actor s a RExpresson actor that ncludes the R scrpts for the data parttonng component n Fg. 2. The man parts of ths scrpt are provded n Table II. Fg. 3 (a): Top-level SBNL workflow. partton to a local learner nstance that runs across the computng nodes. The sub-workflow of the Map actor, shown n Fg. 3 (c), manly calls a RExpresson actor to run Local Learner R scrpt. The man parts of ths scrpt are provded n Table III. The subworkflow of the Reduce actor, shown n Fg. 3 (d), manly calls a RExpresson actor to run Master Learner R scrpt. The man parts of ths scrpt are provded n Table IV. Based on the dependency between the Map and Reduce actor n Fg. 3 (b), the DDP Drector can manage ther executons so that Reduce actor can only be executed after Map actor fnshes all local learner processng. Ths workflow demonstrates how Kepler can facltate buldng parallel network learner algorthms. The DDP framework of Kepler provdes basc buldng blocks for the DDP patterns and supports the dependences between them. RExpresson actor can easly ntegrate user R scrpts wth other parts of the workflow. Kepler also provdes subsdary actors, such as Expresson and DDPDataSource, for supportng operatons needed for a complete and executable workflow. Overall, Kepler users can buld scalable network learner workflows wthout wrtng programs except needed R scrpts. V. EVALUATION The evaluaton results of SBNL are presented n ths secton. Several bg data sets are used to evaluate SBNL. The goal of the evaluaton s to address the followng questons: 1. When constructng the SBNL, what s the best slce sze for each bg data set? 2. On all bg data sets, does SBNL workflow acheve good learnng accuracy wth sgnfcant performance mprovement? Fg. 3 (b): DDP sub-workflow. Fg. 3 (c): Local learner sub-workflow n Map. Fg. 3 (d): Master learner sub-workflow n Reduce. Fg. 3. SBNL workflow n Kepler. DDPNetworkLearner s a composte actor whose subworkflow s shown n Fg. 3 (b). Map and Reduce DDP actors are used here to acheve parallel local learner executon and sequental master learner executon. DDP Drector s used to manage the sub-workflow executon by communcatng wth underlyng DDP engnes. DDPDataSource actor reads parttons generated by ParttonData actor and sends each A bref descrpton of the data sets and threshold selecton study are presented n Subsecton A. Subsecton B answers two questons above. A. Background The background of the emprcal study s descrbed n detal n ths subsecton. Frst, data sets are descrbed and evaluaton measures are presented. Then threshold selecton study s shown. The machne specfcaton for the evaluaton of all results s as follows. Four compute nodes n a cluster envronment are employed, where each node has two eghtcore 2.6 GHz CPUs, and 64 GB memory. Each node could access the nput data va a shared fle system. 1) Data sets and measurements Three large data sets are used n ths emprcal study. A bref descrpton of each data set s presented below. Propertes of all data sets are summarzed n Table V. Data set #Rows (mllon) TABLE V. DATA SETS #Arcs #Varables Data sze (GB) Alarm10M HalFnder10M Insurance10M

7 Alarm: A medcal BN for patent montorng. HalFnder: A BN that forecasts severe summer hal n the northeastern Colorado area. Insurance: An adaptve BN Network modelng the car nsurance problem. All data sets are generated from well-known Bayesan networks as follows usng logc samplng [20]: For Alarm network, the data set contans 10 mllons rows and s called Alarm10M. Smlarly, For Halfnder network, the data set s called Hafnder10M and for nsurance network, the data set s called Insurance10M. Snce each data set contans 10 mllon rows and all the data set szes exceed the normal data set sze applcable for BN learnng. It s very tme consumng and sometme nfeasble to learn BN from most of the data sets lsted above usng tradtonal BN learnng algorthm. 2) Threshold Selecton Study To measure the BN structures learned by SBNL, we use BDe score and SHD descrbed n Secton II.B and II.C. In Secton II, two functons are descrbed. SarcCalculator calculates arc score to measure the qualty of data set D, FndNs uses SarcCalculator to fnd the deal data slce sze N s. It s crtcal to study and verfy the correctness of the functon SarcCalculator to make sure that the data preprocess phase of SBNL are sound and practcal. To evaluate the correctness of S Arc, we used sx dfferent data sets: three good data sets wthout any nose, followed by three bad data sets wth 5% nose from each BN lsted n Table VI. Then we calculate S Arc for each data set and compare t wth the S Arc of the golden standard network (GSB). SHD s lsted for each learned BN. Table VI shows that gven good data set, value of S Arc s very close to S Arc of GSN. Ths ndcates that S Arc s ndeed an accurate measure for the qualty of the data sets. Furthermore, t s observed that bad data sets wth nose have very low S Arc : generally lower than -0.5, and the SHD of the correspondng bad data set s far away from the correct structure. The column Select ndcates whether SBNL selects the data set n the DQE actvty. Date set TABLE VI. SARC OF SIX DIFFERENT DATA SETS Rows (K) S Arc (MMHC) S Arc (GSN) Select SHD n Table VII are very close to S Arc of GSN lsted n Table VI. Ths ensures the correctness of the partton sze N s. TABLE VII. ACCURACY RESULTS OF THREE NETWORKS Network N s S Arc Alarm HalFnder Insurance B. Experments We conducted our experments usng four compute nodes n a cluster envronment. The tests were done wth Hadoop verson 2.2. In the tests, one node s assgned to task coordnaton and others to worker tasks. We ran our workflow wth dfferent worker nodes to see the scalablty of executons and how ts performance changes. We also mplemented an R program that only uses the orgnal MMHC algorthm for the network learnng task. Because the R program has no parallel executon across multple nodes, no data partton step s needed and t can only run on one node. Its executon tme wll be the baselne for the performance comparsons. We ran our experments wth three data sets, whose executon nformaton s shown n Table VIII, from whch we can see our workflow acheved good scalablty runnng on more worker nodes. Although our SBNL workflow has an addtonal step for data partton, ts executon tmes are stll better than the base lne executon. The overall performance shows less mprovement when the worker node number ncreases. It s because some steps of the workflow (data partton, master leaner) cannot utlze the dstrbuted envronment for parallel executons. We plan to speedup the data partton step by utlzng the parallel data loadng and parttonng capablty of HDFS 5. We wll also do the experments wth bgger data sets on larger envronments. TABLE VIII. EXECUTION PERFORMANCE OF THE NETWORK ANALYSIS WORKFLOW AND BASE LINE R PROGRAM (UNIT: MINUTES) Data set Base lne (16 Core) Parallel executons wth Kepler 32 Core 48 Core 64 Core Alarm_good Yes 4 HalFnder_good Yes 26 Insurance_good Yes 9 Alarm_Bad No 12 HalFnder_Bad No 58 Alarm5M (936 MB) Alarm10M (1.9 GB) Insurance10M (1.9 GB) Insurance_Bad No 21 Accordng to Table VI, we can clam that SBNL has 100% data selecton accuracy n ts local learner component. Therefore, we could conclude that S Arc s an accurate measure to test the fathfulness of data set D. After runnng FndNs on three bg data sets, we obtan N s for each bg data set (n Table VII). Note that S Arc values shown We frst gve Halfnder data set to SBNL workflow. In the frst data evaluaton actor, S Arc value of Halfnder remans very hgh around 1.5. So SBNL workflow determnes that Halfnder data set s not sutable for BN learnng. To confrm t, we further apply a data set of Halfnder to MMHC 5 Hadoop Dstrbuted Fle System (HDFS) :

8 algorthm. The learned BN s very dfferent from the actual Halfnder network snce there are over 30 mssng arcs. Ths study affrms the correctness of SBNL workflow. Low qualty data sets are rejected n the begnnng by SBNL so as to ensure good learnng results. We also evaluated the Alarm and Insurance data set. Both data sets have good qualty. The accuracy analyss s summarzed n Table IX. Alarm10M data set s parttoned nto 208 parttons and Insurance10M data set s parttoned nto 625 parttons. For Alarm10M data set, we compare SBNL s result wth a sngle row data set (Alarm96K) appled drectly to MMHC algorthm on a sngle machne. Smlarly, for nsurance10m data set, we compare SBNL s result wth a sngle row data set (Insurance16K) appled to MMHC algorthm. TABLE IX. NETWORK ACCURACY ANALYSIS S Arc AA MA SHD Alarm10M (SBNL) Alarm96K (Sngle) Insurance10M (SBNL) Insurance16K (Sngle) Alarm data set has good data qualty wth very low S Arc, therefore, the learned BN s close to the actual network. The best partton sze of Alarm data set s We use a separate Alarm data set wth Alarm96K to compare SBNL s accuracy. It s observed that after applyng the Alarm10M data set rows to SBNL, we learned a BN wth 37 correct arcs, zero mssng arcs wth a structure hammng dstance of nne. It s close to the learnng result of Alarm96K data set, showng good learnng accuracy of SBNL workflow. Note that there s no added arc; ths s due to the ensemble weghtng mechansm of SBNL whch selects popular arcs dscovered by the local learner, resultng n a very compact BN wth most of the correct arcs. On the other hand, Insurance data set has hgher S Arc value. So ts learnng accuracy s not as good as Alarm network. The best partton sze of Insurance10M s It can be observed that the learnng results of Insurance10M data set wth SBNL workflow are smlar to that of Insurance16K data set. Agan, ths comparson confrms the learnng accuracy of SBNL workflow. VI. RELATED WORK To effcently manage the massve amounts of data encountered n bg data applcatons, approaches to n-stu analytcs have been nvestgated. Zou et al. explore the use of data reducton va onlne data compresson n [32] and apply ths dea to large-scale remote vsual data exploraton [33]. Our approach addresses the data set sze problem by usng a preprocessng technque to elmnate poor-qualty data, and by usng an approach that leverages an ensemble model coupled wth dstrbuted processng. Learnng BN from data s a tradtonal research area wth a long hstory. Chckerng et al. [16] show that fndng the optmal BN structure n the graph search space s NP hard. A comprehensve comparatve survey was carred out on BN learnng algorthms [9]. The majorty of learnng algorthms are not desgned for Bg Data BN learnng. The number of possble BN structures grows super-exponentally wth respect to the number of varables. In addton, large data sets can hardly ft n the memory of a sngle machne. Therefore, t s advsable to learn BN from Bg Data through dstrbuted computng methods n a dvde-and-conquer fashon. Chen et al. [13] study the problem of learnng the structure of a BN from a dstrbuted heterogeneous data sources, but ths approach focuses on learnng sparsely connected networks wth dfferent features at each ste. In 2010, Na and Yang proposed a method for learnng the structure of a BN from dstrbuted data sources [14], but ther local learnng s usng K2 algorthm wth medum accuracy and the approach does not scale for bg data set. In 2011, Tamada et al. proposed a Parallel Algorthm for learnng optmal BN structure [15], but ths approach s lmted for optmal structure search of BNs, whch s not sutable for large data sets wth mllons of records. In Bg Data BN learnng area, current research focus manly on methods for dstrbuted computng and scale-up mplementaton. To our best knowledge, ths research s the frst to brng workflow concept nto Bg Data BN learnng. Ths s a key contrbuton to the exstng research. There are several studes to scale up machne learnng applcatons. The MapReduce framework has been shown to be broadly applcable to many machne learnng algorthms [26]. Das et al. use a JSON query language, called Jaql, as brdge between R and Hadoop [23]. It provdes a new package for HDFS operatons. Ghotng and Pednault propose Hadoop-ML, an nfrastructure on whch developers can buld task-parallel or data-parallel machne learnng algorthms on program blocks under the language runtme envronment [24]. Budu et al. demonstrate how to use DryadLINQ for machne learnng applcatons such as decson tree nducton and k-means [34]. Yet learnng curves of these tools are relatvely steep snce researchers have to learn the archtectures and nterfaces to mplement ther own data mnng algorthms. Wegener et al. ntroduce a system archtecture for GUI based data mnng of large data on clusters based on MapReduce that overcomes the lmtatons of data mnng toolkts [25]. It uses an mplementaton based on Weka and Hadoop to verfy the archtecture. Ths work s smlar to our work as both provde GUI support and Hadoop ntegraton. Our work s targeted to another popular machne learnng and data mnng tool, namely R, and our framework can adapt wth dfferent DDP engnes. There are also some machne learnng workflow tools such as Knme and Ipython notebook 6. For nstance, Knme provdes a lot of machne learnng packages. Yet ts Bg Data extenson 7 currently s lmted to Hadoop/HDFS access. We have not seen how DDP patterns/sub-workflows are supported n these workflow tools. 6 Ipython notebook: 7

9 VII. CONCLUSIONS In the Bg Data era, technques for processng and analyzng data must work n contexts where the data set conssts of mllons of samples and the amount of data s measured n petabytes. By combnng machne learnng, dstrbuted computng and workflow technques, we desgn a Scalable Bayesan Network Learnng (SBNL) workflow. The workflow ncludes ntellgent Bg Data pre-processng, and effectve BN learnng from Bg Data by leveragng ensemble learnng and dstrbuted computng model. We also llustrate how the Kepler scentfc workflow system can easly provde scalablty to Bayesan network learnng. It should be noted that ths approach can be appled to many other machne learnng technques as well to make them scalable and Bg Data ready. For future work, we plan to mprove the performance of the data partton part by ntegratng the current data partton approach wth HDFS to acheve parallel data partton and loadng. We also plan to apply our work on bgger data sets wth more dstrbuted resources to further verfy ts scalablty. ACKNOWLEDGMENT Ths work s supported by the Natural Scence Foundaton of Jangsu Provnce, Chna under grant No.BK and Natonal Scence Foundaton, U.S. under grant DBI and REFERENCES [1] R. Lu, H. Zhu, X. Lu, J. K. Lu, J. Shao. Toward effcent and prvacypreservng computng n bg data era. Network, IEEE, Vol. 28, Issue 4, pp , [2] Y. Zhang, Y. Zhang, E. Swears, N. Laros, Z. Wang, Q. J, Modelng Temporal Interactons wth Interval Temporal Bayesan Networks for Complex Actvty Recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol. 35, Issue 10, pp , 2013 [3] M. Nel, C. Xaol, N. Fenton, Optmzng the Calculaton of Condtonal Probablty Tables n Hybrd Bayesan Networks Usng Bnary Factorzaton, IEEE Transactons on Knowledge and Data Engneerng, Vol. 24, Issue 7, pp , 2012 [4] M. Gu, A. Pahwa, S. Das, Bayesan Network Model Wth Monte Carlo Smulatons for Analyss of Anmal-Related Outages n Overhead Dstrbuton Systems, IEEE Transactons on Power Systems, Vol. 26, Issue 3, pp , 2011 [5] N. E. Fenton, M. Nel, A crtque of software defect predcton models, IEEE Transactons on Software Engneerng, Vol. 25, Issue 5, pp , [6] S. Sun, C. Zhang, G. Yu, A bayesan network approach to traffc flow forecastng, IEEE Transactons on Intellgent Transportaton Systems, Vol 7, Issue 1, pp , [7] K. Dejaeger, T. Verbraken, B. Baesens, Toward Comprehensble Software Fault Predcton Models Usng Bayesan Network Classfers, IEEE Transactons on Software Engneerng, Vol. 39, Issue 2, pp , [8] M. Nel and N. Fenton, Usng Bayesan Networks to Model the Operatonal Rsk to Informaton Technology Infrastructure n Fnancal Insttutons, Journal of Fnancal Transformaton, Vol. 22, pp , [9] Y. Tang, K. Cooper, C. Cangussu, Bayesan Belef Network Structure Learnng Algorthms, Techncal Report. Unversty of Texas at Dallas. UTDCS [10] I. Tsamardnos, L. E. Brown, C. F. Alfers, The max-mn hll-clmbng Bayesan network structure learnng algorthm, Machne Learnng, Vol. 65, Issue 1, pp , [11] J. Cheng, R. Grener, J. Kelly, D. A, Bell, W. Lu, Learnng Bayesan networks from data: An nformaton-theory based approach, Artfcal Intellgence, Vol.137, pp , [12] X. Xe, Z. Geng, A Recursve Method for Structural Learnng of Drected Acyclc Graphs, Journal of Machne Learnng Research, Vol.9, pp , [13] R. Chen, K. Svakumar, H. Kargupta, Learnng bayesan network structure from dstrbuted data, In Proceedngs of the 3rd SIAM Internatonal Data Mnng Conference, pp , [14] Y. Na, J. Yang, Dstrbuted Bayesan network structure learnng, In Proceedngs of 2010 IEEE Internatonal Symposum on Industral Electroncs (ISIE), pp , [15] Y. Tamada, S. Imoto, S. Myano, Parallel Algorthm for Learnng Optmal Bayesan Network Structure, Journal of Machne Learnng Research, Vol.12, pp , [16] D. M. Chckerng, D. Geger, D. Heckerman, Learnng Bayesan networks s NP-hard. Vol. 196, Techncal Report MSR-TR-94-17, Mcrosoft Research, [17] D. Optz, and R. Macln, Popular ensemble methods: An emprcal study, Journal of Artfcal Intellgence Research, Vol. 11, pp , [18] B. Zenko, A comparson of stackng wth meta decson trees to baggng, boostng, and stackng wth other methods, In Proceedngs IEEE Internatonal Conference on Data Mnng (ICDM 2001), pp , [19] K. Monteth, J. L. Carroll, K. Sepp, T. Martnez., Turnng Bayesan Model Averagng nto Bayesan Model Combnaton, In Proceedngs of the Internatonal Jont Conference on Neural Networks (IJCNN'11), pp , [20] J. A. Hoetng, D. Madgan, A. E. Raftery, C. T. Volnsky, Bayesan Model Averagng: A Tutoral. Statstcal Scence, Vol. 14, Issue 4, pp , [21] M. Scutar, Bayesan Network Repostory, [22] I. Benlch, Suermondt, H.R. Chavez, G. Cooper, The ALARM montorng system: a case study wth two probablstc nference technques for belef networks, In Proceedngs of Artfcal Intellgence n Medcal Care, pp , [23] S. Das, Y. Ssmans, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson, Rcardo: Integratng R and Hadoop, In Proceedngs of ACM SIGMOD Internatonal Conference on Management Data (SIGMOD10), pp , [24] A. Ghotng and E. Pednault, Hadoop-ML: An Infrastructure for the Rapd Implementaton of Parallel Reusable Analytcs, In Proceedngs of Large-Scale Machne Learnng: Parallelsm and Massve Data Sets Workshop (NIPS 09), [25] D. Wegener, M. Mock, D. Adranale, S. Wrobel, Toolkt-Based Hgh- Performance Data Mnng of Large Data on MapReduce Clusters, In Proceedngs of Internatonal Conference on Data Mnng Workshops (ICDMW 09), pp , [26] C. Chu, S. K. Km, Y. Ln, Y. Yu, G. R. Bradsk, A. Y. Ng, K. Olukotun, Map-Reduce for machne learnng on multcore, n Advances n neural nformaton processng systems 19, pp , [27] D. Battre, S. Ewen, F. Hueske, O. Kao, V. Markl, D. Warneke, Nephele/PACTs: A programmng model and executon framework for web-scale analytcal processng, In Proceedngs of the 1st ACM symposum on Cloud computng (SoCC 10), ACM, pp , [28] B. Ludaescher, I. Altntas, C. Berkley, D. Hggns, E. Jaeger-Frank, M. Jones, E. A. Lee, J. Tao, Y. Zhao, Scentfc workflow management and the Kepler system, Concurrency and Computaton: Practce & Experence, Specal Issue on Scentfc Workflows, Vol. 18, Issue 10, pp , [29] A. Goders, C. Brooks, I. Altntas, E. Lee, C. Goble, Heterogeneous composton of models of computaton, Future Generaton Computer Systems, Vol. 25, Issue 5, pp , 2009.

10 [30] J. Wang, D. Crawl, I. Altntas, W. L. Bg Data Applcatons usng Workflows for Data Parallel Computng, Computng n Scence & Engneerng, Vol. 16, Issue 4, pp , July-Aug. 2014, IEEE. [31] M. R. Berthold, N. Cebron, F. Dll, T. R. Gabrel, T. Kötter, T. Menl, P. Ohl, C. Seb, K. Thel, B. Wswedel, KNIME: The Konstanz nformaton mner. Studes n Classfcaton, Data Analyss, and Knowledge Organzaton, pp , [32] H. Zou, F. Zheng, M. Wolf, G. Esenhauer, K. Schwan, H. Abbas, Q. Lu, N. Podhorszk, S. Klasky, Qualty-Aware Data Management for Large Scale Scentfc Applcatons, In Proceedngs of Hgh Performance Computng, Networkng, Storage and Analyss (SCC), 2012 SC Companon, pp , [33] H. Zou, M. Slawnska, K. Schwan, M. Wolf, G. Esenhauer, F. Zheng, J. Dayal, J. Logan, S. Klasky, T. Bode, M. Knsey, M. Clark. FlexQuery: An Onlne In-stu Query System for Interactve Remote Vsual Data Exploraton at Large Scale. In Proceedngs of 2013 IEEE Internatonal Conference on Cluster Computng (Cluster 2013). pp. 1-8, [34] M. Budu, D. Fetterly, M. Isard, F. McSherry, Y. Yu. "Large-scale machne learnng usng DryadLINQ.", n R. Bekkerman, M. Blenko, J. Langford (Eds.), Scalng up Machne Learnng: Parallel and Dstrbuted Approaches, Cambrdge Unversty Press, pp 49-68, [35] W. Raghupath, V. Raghupath. Bg data analytcs n healthcare: promse and potental. Health Informaton Scence and Systems, Vol. 2, Issue 1, 3, 2014.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

Today s class. Chapter 13. Sources of uncertainty. Decision making with uncertainty

Today s class. Chapter 13. Sources of uncertainty. Decision making with uncertainty Today s class Probablty theory Bayesan nference From the ont dstrbuton Usng ndependence/factorng From sources of evdence Chapter 13 1 2 Sources of uncertanty Uncertan nputs Mssng data Nosy data Uncertan

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

A Hierarchical Reliability Model of Service-Based Software System

A Hierarchical Reliability Model of Service-Based Software System 2009 33rd Annual IEEE Internatonal Computer Software and Applcatons Conference A Herarchcal Relablty Model of Servce-Based Software System Lun Wang, Xaoyng Ba, Lzhu Zhou Department of Computer Scence and

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

Performance Management and Evaluation Research to University Students

Performance Management and Evaluation Research to University Students 631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures An ILP Formulaton for Task Mappng and Schedulng on Mult-core Archtectures Yng Y, We Han, Xn Zhao, Ahmet T. Erdogan and Tughrul Arslan Unversty of Ednburgh, The Kng's Buldngs, Mayfeld Road, Ednburgh, EH9

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Development of an intelligent system for tool wear monitoring applying neural networks

Development of an intelligent system for tool wear monitoring applying neural networks of Achevements n Materals and Manufacturng Engneerng VOLUME 14 ISSUE 1-2 January-February 2006 Development of an ntellgent system for tool wear montorng applyng neural networks A. Antć a, J. Hodolč a,

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel Load-Balancng Method for Soluton-Adaptve Unstructured Fnte Element

More information

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems 1 Applcaton of Mult-Agents for Fault Detecton and Reconfguraton of Power Dstrbuton Systems K. Nareshkumar, Member, IEEE, M. A. Choudhry, Senor Member, IEEE, J. La, A. Felach, Senor Member, IEEE Abstract--The

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS Yunhong Xu, Faculty of Management and Economcs, Kunmng Unversty of Scence and Technology,

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems A Cost-Effectve Strategy for Intermedate Data Storage n Scentfc Cloud Workflow Systems Dong Yuan, Yun Yang, Xao Lu, Jnjun Chen Faculty of Informaton and Communcaton Technologes, Swnburne Unversty of Technology

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style Internatonal Journal of Scentfc & Engneerng Research Volume 2, Issue 7, July-20 An Integrated Approach of AHP-GP and Vsualzaton for Software Archtecture Optmzaton: A case-study for selecton of archtecture

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide Reportng Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (ncludng SME Corporate), Soveregn and Bank Instructon Gude Ths nstructon gude s desgned to assst n the completon of the FIRB

More information

The Load Balancing of Database Allocation in the Cloud

The Load Balancing of Database Allocation in the Cloud , March 3-5, 23, Hong Kong The Load Balancng of Database Allocaton n the Cloud Yu-lung Lo and Mn-Shan La Abstract Each database host n the cloud platform often has to servce more than one database applcaton

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing

All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing All Roads Lead to Rome: Optmstc Recovery for Dstrbuted Iteratve Data Processng Sebastan Schelter Stephan Ewen Kostas Tzoumas Volker Markl Technsche Unverstät Berln, Germany frstname.lastname@tu-berln.de

More information

Watermark-based Provable Data Possession for Multimedia File in Cloud Storage

Watermark-based Provable Data Possession for Multimedia File in Cloud Storage Vol.48 (CIA 014), pp.103-107 http://dx.do.org/10.1457/astl.014.48.18 Watermar-based Provable Data Possesson for Multmeda Fle n Cloud Storage Yongjun Ren 1,, Jang Xu 1,, Jn Wang 1,, Lmng Fang 3, Jeong-U

More information

iavenue iavenue i i i iavenue iavenue iavenue

iavenue iavenue i i i iavenue iavenue iavenue Saratoga Systems' enterprse-wde Avenue CRM system s a comprehensve web-enabled software soluton. Ths next generaton system enables you to effectvely manage and enhance your customer relatonshps n both

More information

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

A Dynamic Energy-Efficiency Mechanism for Data Center Networks A Dynamc Energy-Effcency Mechansm for Data Center Networks Sun Lang, Zhang Jnfang, Huang Daochao, Yang Dong, Qn Yajuan A Dynamc Energy-Effcency Mechansm for Data Center Networks 1 Sun Lang, 1 Zhang Jnfang,

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information