Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing

Size: px
Start display at page:

Download "Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing"

From this document you will learn the answers to the following questions:

  • What is the cost of storing and retrieving data?

  • What is the cost of buldng the model per node?

  • What is the cost of buldng the model?

Transcription

1 Preprnt UCRL-JC-?????? Use of Numercal Models as Data Proxes for Approxmate Ad-Hoc Query Processng R. Kammura, G. Abdulla, C. Baldwn, T. Crtchlow, B. Lee, I. Lozares, R. Musck, and N. Tang U.S. Department of Energy Lawrence Lvermore Natonal Laboratory Ths artcle appears n the Proceedngs of the 7 th Jont Conference on Informaton Systems, Research Trangle Park, NC, September Approved for publc release; further dssemnaton unlmted

2 DISCLAIMER Ths document was prepared as an account of work sponsored by an agency of the Unted States Government. Nether the Unted States Government nor the Unversty of Calforna nor any of ther employees, makes any warranty, express or mpled, or assumes any legal lablty or responsblty for the accuracy, completeness, or usefulness of any nformaton, apparatus, product, or process dsclosed, or represents that ts use would not nfrnge prvately owned rghts. Reference heren to any specfc commercal product, process, or servce by trade name, trademark, manufacturer, or otherwse, does not necessarly consttute or mply ts endorsement, recommendaton, or favorng by the Unted States Government or the Unversty of Calforna. The vews and opnons of authors expressed heren do not necessarly state or reflect those of the Unted States Government or the Unversty of Calforna, and shall not be used for advertsng or product endorsement purposes. Ths s a preprnt of a paper ntended for publcaton n a journal or proceedngs. Snce changes may be made before publcaton, ths preprnt s made avalable wth the understandng that t wll not be cted or reproduced wthout the permsson of the author. Ths report has been reproduced drectly from the best avalable copy. Avalable electroncally at Avalable for a processng fee to U.S. Department of Energy And ts contractors n paper from U.S. Department of Energy Offce of Scentfc and Techncal Informaton P.O. Box 62 Oak Rdge, TN Telephone: (865) Facsmle: (865) E-mal: reports@adons.ost.gov Avalable for the sale to the publc from U.S. Department of Commerce Natonal Techncal Informaton Servce 5285 Port Royal Road Sprngfeld, VA Telephone: (800) Facsmle: (703) E-mal: orders@nts.fedworld.gov Onlne orderng: OR Lawrence Lvermore Natonal Laboratory Techncal Informaton Department s Dgtal Lbrary

3 Use of Numercal Models as Data Proxes for Approxmate Ad-Hoc Query Processng R. Kammura 1, G. Abdulla 1, C. Baldwn 1, T. Crtchlow 1, B. Lee 2, I. Lozares 1, R. Musck 3, N. Tang 1 1 CASC, Lawrence Lvermore Natonal Laboratory, CA 2 Department of Computer Scence, Unversty of Vermont, VT 3 Ikun, Inc., CA Abstract As datasets grow beyond the ggabyte scale, there s an ncreasng demand to develop technques for dealng/nteractng wth them. To ths end, the DataFoundry team at the Lawrence Lvermore Natonal Laboratory has developed a software prototype called Approxmate Adhoc Query Engne for Smulaton Data (AQSm). The goal of AQSm s to provde a framework that allows scentsts to nteractvely perform adhoc queres over terabyte scale datasets usng numercal models as proxes for the orgnal data. The advantages of ths system are several. The frst s that by storng only the model parameters, each dataset occupes a smaller footprnt compared to the orgnal, ncreasng the shelf lfe of such datasets before they are sent to archval storage. Second, the models are geared towards approxmate queryng as they are bult at dfferent resolutons, allowng the user to make the tradeoff between model accuracy and query response tme. Ths allows the user greater opportuntes for exploratory data analyss. Lastly, several dfferent models are allowed, each focusng on a dfferent characterstc of the data thereby enhancng the nterpretablty of the data compared to the orgnal. The focus of ths paper s on the modelng aspects of the AQSm framework. 1. Introducton Advances n computer technology and decreasng hardware costs have facltated the ease by whch large volumes of data are measured, generated, and stored. Whle ths s vewed as an nformaton bonanza, the result has often been a flood of data wth lmted means of makng effectve use of t. The central problem s that of sze. When datasets reach the ggabyte level and beyond, storng the nformaton as well as beng able to nteract wth t become ncreasngly constraned n space or tme or n some cases both. Whle research s beng conducted n developng mass storage devces, there wll always be lmted resources to accommodate the dataflow. A related ssue s that of data compresson as deally one would lke to store as mnmal nformaton as possble to maxmze the effectveness of the hardware. But even f the spatal constrants are resolved, the ssue of how to nteract wth the data wthn a reasonable perod of tme stll remans. Data analyss tasks such as askng an ad-hoc query become neffectve for large scale data sets f the tme to fnd and retreve the exact answer to the query posed s often longer than the user s wllng to tolerate. Ths s especally true f the scentst s n an exploratory data analyss mode and has several questons that they would lke to examne. To deal wth tme constrant problem, one approach that has been used recently s approxmate queryng systems, whch seek not to provde an exact soluton to the posed query but an approxmate one. The ratonal s that the user may prefer a fast, approxmate answer as they explore the dataset to develop a better understandng of truly nterestng queres and/or regons. To ths end, Acharya, et.al., 1999, Acharya, et.al., 1999a, Chakrabart s, et.al., 2000, Ioannds and Poosala (1999), Vtter and Wang (1999), Vtter and Wang (1998) have developed dfferent approaches for dealng wth aggregate and range queres n an approxmate fashon. However, the emphass n the above approaches focused on the ssue of a fast query response system and dd not specfcally address the ssue of storng such large data sets. Ths latter pont s mportant f the user should wsh to revst the data set whch, owng to ts sze, wll often have to be moved to tertary storage as s often the case wth scentfc smulatons. Accessng the data at such locatons s often prohbtve tme-wse and should be avoded f possble. As a result, what s needed s a system that not only allows approxmate ad hoc queres but also provdes a compact representaton of the data that s far easer to store and access compared to the orgnal. A survey of the lterature uncovers that Hachem, et.al. (1995) have mplemented a system where polynomal regresson models are used to model non-spatal data and the regresson coeffcents are stored n the database. In response to a query, the data are reconstructed from models. Such a system provdes the benefts of approxmate query response but at a far smaller storage cost. Followng a smlar approach ndependently, the DataFoundry team at Lawrence Lvermore Natonal Laboratory has developed the Approxmate Ad-hoc Query Engne for Smulaton Data (AQSm). The goal of AQSm s to provde a software framework that not only allows relatvely fast nteractve adhoc query capablty of terabyte scale data but also provdes a compact representaton of the data. Ths s acheved by constructng mathematcal models of the data at dfferent resolutons. In effect, these models act as data proxes and perform n ther place n response to a query submsson. Ths has several benefts. Frst, the dfferent resolutons of the data allow the user to trade off tme for accuracy. Second, along wth an ndex fle, only the model parameters are stored n a fle. As ths number s often smaller than the orgnal dataset sze, the shelf lfe of the dataset can be extended. The user can send the orgnal data to archval storage but can query usng the model parameters for data reconstructon. Fnally, as mathematcal models often characterze dfferent facets of the data, the model parameters themselves may provde addtonal nsghts not apparent when just dealng wth the orgnal raw values. Thus, n contrast to Hachem, et.al. s work where only one model s used to address the dfferent query needs of the scentsts, AQSm uses multple 1

4 Query processng query Preprocessng mesh fle Query Engne Metadata ndex to use Mesh converson reformatted data Index Searcher data value ndex fles Index Constructor Mesh Parttoner Predcate Processor/ Data Value Reconstructor model fles Modeler fltered data value Vsualzaton Fgure 1. Framework for AQSm. models. Whle AqSm s bult for dealng wth the spatotemporal nature of the scentfc smulatons for the Accelerated Strategc Computng Intatve program (ASCI), the framework s generc enough to be appled to other domans that deal wth terabyte scale datasets. In ths paper the focus s on the modelng aspect of the AQSm framework. Secton 2 provdes a bref overvew of the entre system. Secton 3 dscusses the modelng ssues that need to be addressed. Current results of AQSm are presented n Secton 4. Fnally, conclusons and future work are dscussed n Secton AQSm Framework Fgure 1 shows the AQSm framework under constructon. The system s dvded nto two phases: preprocessng and query processng. The goal of the frst phase s to take the orgnal data and buld the model fles and ndex structures necessary to support the query-processng step that follows t. As the focus wll be on developng a compact and accurate data proxy, more tme and computatonal effort s placed n ths ntal perod. The ratonal s to concentrate the computatonal burden when the data s close at hand and stll away from the user. In ths way, the tme spent durng query processng whch s where the user s expected to spend the bulk of ther tme can be mnmzed. The nput to the preprocessng wll be the orgnal data tself, the selecton of models to be mplemented, and a parttonng scheme. To acheve the approxmate query answer capablty, the data s statstcally characterzed and modeled at dfferent resolutons. These resolutons are created by parttonng the data nto progressvely smaller subunts and then subsequently modelng each of these parttons by the selected models. Ths procedure s repeated untl the parttonng reaches a selected threshold upon whch t s termnated. The current parttonng scheme s bsect the data along each of the spato-temporal coordnates (x,y,z,t), thereby creatng a 16-way tree. The ratonal was to take advantage of the spato-temporal nature of the scentfc smulatons that were beng studed. In any event, the end result s a tree structure. The root node represents the entre data set wth each of ts chldren (and subsequent chldren) representng a dfferent subset of the data. Each node contans the followng statstcs about the data that was n t: number of data ponts, number of models bult on the data n that node, the types of models used, model errors, and for each varable, ts correspondng mean, standard devaton, mnmum and maxmum values. As wll be dscussed later, the ndex searcher wll use ths nformaton when searchng for a response to a query. If drectly mplemented as stated above, one problem that s mmedately encountered s that s t unlkely that a model bult wth a majorty of the data s accurate or that t can be bult n a reasonably short tme. Hence, to balance the tme-accuracy tradeoff, the key s to fx the model complexty for each node. The supposton s that as the data becomes ncreasngly parttoned, the localzaton wll ncrease the lkelhood that a model of fxed complexty wll be able to model the behavor reasonably well. A comparable analogy s to break a nonlnear curve nto pece-wse components whch dependng on the length of the component can be suffcently lnear for lnear regresson models to be bult on the components. Naturally, there s no guarantee that the accuracy wll reach what s desred but ths combnaton of parttonng and modelng s beng used to a frst approxmaton. As wll be dscussed n Secton 5, dentfyng metrcs as to when to buld models as opposed to havng them bult at every node s a research topc. If such a metrc s found, t can decrease storage and computatonal costs of buldng models to only those that are needed. Once the parttonng and model buldng s done, the output from the preprocessng stage s an ndex fle contanng all the node nformaton along wth the correspondng model fles. It 2

5 s mportant to note the orgnal data s not stored n ether the ndex or model fles. The nodes only contan the nformaton lsted above along wth ponters to ts chldren. The model fles only contan nformaton about the model parameters that were used for each node. Hence, t s hoped that these subsequent fles wll be smaller n sze than the orgnal data. So, f a partcular secton of the data s to be reconstructed one needs to know whch node and whch model to use to reconstruct that partcular subsecton. The query-processng phase s the sde that deals prmarly wth the user and s the means by whch they present queres to the system. After the ndex and model fles have been ntalzed and read n by the system, the user can pose adhoc queres to the query engne. The query engne then dentfes the varables and predcates n the query and selects an ndex to use based on the metadata. An ndex searcher then examnes the ndex nodes usng the statstcal nformaton contaned wthn them to determne whch nodes (and models) are most relevant to the query. A lst of canddate nodes s then sent to a predcate processor/data value reconstructor, whch then reconstructs the data and formulates a response. Durng ths perod, the user can specfy the accuracy of the response and ths nformaton wll be used to determne how deep wthn the tree to search. Once the data has been reconstructed t s then sent to a vsualzaton tool to vew the results. 3. Modelng Issues As the models act as proxes for the data, t s essental that a set of crtera or metrcs be developed to assess the feasblty of usng a partcular model for the AQSm system. Intally, our focus was on tradtonal modelng crtera such as model error and complexty as these are often used to assess model performance. Unfortunately, due to the scale of the data nvolved and the tght couplng wth respect to queres, these crtera were found not to be partcularly nformatve. For example, model error as a measure of model accuracy was relevant but ts mportance was somewhat reduced to due to the multresolutonal structure of the tree whch reconstructs the data, n prncple, at a hgher accuracy (barng overfttng) as the leaves of the tree are approached. But as ths s nevertheless an mportant measure, mean square error or root mean square error was selected as a model error metrc to establsh a standard. Smlarly, model complexty was also vewed at frst as mportant but oftentmes metrcs such as Akake or Bayes nformaton crtera are often used to compare among a set of models, balancng model smplcty wth error, Myung (2000). The drawback was that the sze of the datasets precludes the buldng of several models or even teratng over the same model to obtan a better ft. As such, a new set of crtera has been developed focusng prmarly on the computatonal and storage cost of buldng models as well as examnng ther applcablty for dealng wth varous query types. The former has been smply termed as model buldng whereas model queryablty s used for the latter Model buldng Here, the goal s to assgn some sem-quanttatve measure to how computatonal expensve buldng a model can be as well as the assocated cost of storng the model parameters. Assume that y s the th varable of the orgnal data and x s a vector of ndependent varable(s) assocated wth y. x s denoted as a vector to emphasze generc-ness. X can be another varable n the data set (unvarate model), a collecton of other varables (multvarate model), or smply an ndex. Wth ths n mnd, the model s formulated as such: y = f ( x ; θ ) (1) where f denotes the model tself and θ s a vector of model parameters (coeffcents) The computatonal cost of buldng the model (equaton 1) per node s expressed n bg O-notaton (Cormen, et.al., 1998): O(F(n)) + r*o(g(n)) (2) where n represents the number of samples n a gven node, O(F(n)) represents the cost of buldng the ntal model where F(n) also ncludes any model specfc preprocessng steps, r s the number of steps needed to refne the model to ether a desred level of complexty or error ( r > 0), O(G(n)) s the cost of the model refnement steps where G(n) represents the functonal form of the cost as a functon of the number of samples. G(n) may be the same as F(n) or functonally dfferent. F(n), G(n) represent the generc functon forms. (eg. they can be n, nlogn, n 2, etc). The term r encapsulates the teratve nature of the refnement process dealng wth ssues such as ncreasng the number of knots n a B-splne or the processng steps for determnng the approprate coeffcent threshold values to acheve a target error. If the user desres a more accurate model, the prce s the addtonal computatonal tme requred to refne the model. Once the model has been bult, there s the ssue of the storage sze of the model. The storage cost per ndex tree node, s, can be estmated as s = p * v + (3) where s the sze of ndex mappng, v s the number of varables, and p s the parameter sze per varable, whch s the sze of θ (number of coeffcents, for example) multpled by the sze of the data type used (e.g., nteger, float, double). In general, the parameter sze cannot be readly estmated snce t s dependent on the data dstrbuton and the level of error specfed. However, f the model complexty s set, the sze can be estmated as the user explctly lmts the number of parameters per varable. Fnally, the ndex mappng may or may not exst dependng on the model type but s presented for completeness. It represents any addtonal bookkeepng that may be requred to reconstruct the data Model buldng trade-offs The above presents a formalsm to estmate the cost of model constructon n both memory and computaton. The followng general observatons can be made: As parttonng occurs, n becomes smaller so the computatonal cost goes down although functonally equaton 2 remans the same. If lower modelng error s desred, ths usually translates nto an ncrease n r of equaton 2 -- ncreasng the number of teratons to brng the model to convergence. θ s also lkely to ncrease (but not always) n sze as addtonal parameters are needed to brng the model to target error levels. Dependng on the modelng algorthm used, the number of teratons and the parameter sze may or may not be ted wth each other. In the case of regresson, addtonal model terms are ntroduced to reduce model error so the teraton and parameter sze are lnked. On the other hand, wth neural networks, the weghts of system do 3

6 not change n number but ther values are adjusted wth each teraton cycle. To summarze, to assess the full mpact of model buldng, the followng need to be known: Intal model cost O(F(n)) Model refnement cost O(G(n)) Model complexty (related to error as well) ths sets r and sze of θ Bookkeepng s t needed and f so how bg Once ths s known, equatons 2 and 3 can be used to estmate cost and storage requrements on a per node bass for a gven model. Ths nformaton then forms a bass by whch dfferent models can be compared on model constructon/memory storage crtera. To consder the effect of the entre data set, multply the above estmates (equatons 2 and 3) by the number of nodes n the ndex tree and adjust for n n the dfferent nodes Model queryablty As mentoned prevously t s not suffcent that the model smply reconstruct the data to wthn a small error as the nature of the ggabyte+ scale data make buld such models computatonally expensve. Ideally the most useful models are the ones that not only reconstruct data well but are amenable to dealng wth a varety of queres. It s ths ssue of queryablty that wll be dscussed n ths secton. In bref, queryablty consders the queston Short of reconstructng the data, how can the model structure of functonalty be used to answer the query? All the models n AQSm have the capablty of reconstructng the data to varyng degrees of accuracy. However, some models wll capture dfferent aspects of the data structure n ther formulaton than others. For example, wth the case of smple lnear regresson of y = mx + b the parameter m contans nformaton on the rate of change of y wth respect to x. Ths ease of nformaton access s what needs to be characterzed n order to effectvely determne whch model to use to answer a query. Ideally, the goal s to arrve at a mappng that would assocate partcular models to certan query types. To acheve ths, a set of crtera s needed to assess the feasblty of dfferent model types n the context of the queres that may be presented. As AQSm wll ncorporate multple models, understandng the applcablty or relevance a partcular model may have n the context of respondng to a query s essental. A survey of the lterature does not reveal any type of model/query assocaton metrcs and ponts to a need to develop one for ths system. In addton, consderng the varety and complexty of queres that may be asked, t s unlkely that a sngle quanttatve metrc can be arrved to measure queryablty. Instead, t may be more feasble to break the assessment nto sem-qualtatve and sem-quanttatve components. The former focuses on breakng down the query types nto dfferent categores such as the one lsted below. (Ths s a rough lst and not meant to be exhaustve or mutually exclusve) 1) range for a gven nput, fnd the assocated output 2) aggregate fnd how many data samples satsfy a gven query predcate 3) scale-dependent patterns/ dynamcty fnd smlar types of behavor at the specfed scale 4) topologcal fnd the samples that are wthn a specfed target range 5) varable nteractons fnd the data samples that satsfy the specfed varable relatonshps 6) area/volume effects fnd the area or volume for a partcular varable condton 7) doman-specfc user defnes a functon of nterest n the query predcate Each model canddate s graded on ther ablty to deal wth the dfferent categores, specfcally whether they are able to address such queres (apart from brute force data reconstructon) and how accurately. An addtonal consderaton may be to assgn a weght to the dfferent query types as to how lkely they are to occur and/or how mportant they are. Ideally, the models that are selected for the AQSm should be complementary wth respect to each other to mnmze redundant coverage. In ths regard, models that are doman-specfc probably should probably be always ncluded snce t s assumed that ther constructon contans domanrelevant nformaton, whch would not to be readly avalable from the orgnal data. But agan ths need may have to be balanced wth respect to how often would that knd of nformaton s requested. The sem-quanttatve aspect may be vewed as the number of data processng steps and/or computatonal tme requred to process the model to arrve at the desred output. For example, f the query s of a range type, specfcally Fnd all x for ths value of y and two models exst -- one model s that of a splne regresson (y = f(x)) and the other s a (x = f(), y = f()), estmatng how long would t take each model to arrve at an answer s an nterestng metrc. The former would requre solvng for the roots of the polynomal whereas the s may smply reconstruct all the data for gven range and then apply a flterng operaton to solate the relevant answers. It should be emphaszed that ths type of analyss s sem-quanttatve n nature snce t s hghly dependent on the dstrbuton of the data and ntal startng ponts Model-query nteracton Ths secton focuses on the cost of nteractng wth a model to get the desred answer to the query. The knowledge ganed here allows models to be compared on the bass of how computatonally expensve they are when dealng wth queres. Assume that the user poses the query n the form, Gven y, provde h(x ). Agan, y denotes a varable of the orgnal data set. H(x ) s presented here as the generc output needed to answer the query. In ts smplest form, h(x ) = x where x s smlar to the one used n equaton 1 wth the focus on ether a sngle orgnal varable or a collecton of varables (snce the ndex s not part of the orgnal data the user s not lkely to make any nqures about t). In a more advanced form, h(x ) may be a user-defned functon that uses as nput x. To answer the query (fnd h(x ) for y ), the procedure s to start wth what the model provdes and then process the results untl the desred nformaton becomes avalable. Ths cost of answerng a query essentally can be estmated by equaton 4 (Corman, et.al., 1998): m = 1 O( f ( n)) = O( m = 1 f ( n)) where m = number of processng steps requred, O(f (n)) s the computatonal cost assocated wth each processng step, n = number of samples. It s assumed that the processng costs are lnearly addtve. The key les n what s meant by a processng step. Consder that there exsts a model for a sngle varable, x = f() where s an ndex, and another model for a dfferent varable, y = g(j) where j s a dfferent ndex. For a Fnd the x assocated wth y, ths would requre creatng or learnng the (4) 4

7 mappng between and j, say = m(j), f the approprate x-y pars are to be uncovered. Ths mappng would be a processng step n addton to the one where for the gven y, the j s would have to be generated, j = g -1 (y), another processng step. The cost of answerng the queres requres both steps be processed. As another example, assume that x of equaton 1 s some multvarate collecton and agan the query s Fnd the x assocated wth y. Ths can be answered by solvng the nverse of equaton 1 and, for llustratve purposes, t s further assumed that ths s O(n 2 ) : 1 x f = ( y ) (5) Ths s not the only approach to solvng the problem. A naïve but smple alternatve would be to reconstruct the data for all avalable y and shft through to look for the x s of nterest. Assume for the sake of argument that ths s of O(nlogn) for the generaton and O(nlogn) for the shftng. Now, t s not clear as to whch approach, the solver or the naïve one, s better. The solver has one processng step but t s computatonally expensve. The naïve way nvolves two steps but ndvdually they are not as expensve. So, dependng on whch route, as well as the sample sze nvolved, the cost can vary. So, equaton 4 should be evaluated on both routes to obtan an estmate of the cost wth the goal of selectng the processng route that s cheaper. Gven two models, t s mportant to determne how much t would cost to use one model over another by breakng t down to the number of processng steps requred to generate the approprate response, and the cost assocated wth each step for each model. The fnal decson based on choosng the lesser total of the two. Note that ths analyss can be expanded to go further than just fndng x but can also consder the cost of generatng h(x ). The formulaton of equaton 4 s such that t can be used to evaluate model performance aganst a range of query types. Ths s done by the formulaton of h(x ). For range queres, t may be suffcent to have h(x ) = x. To examne how they fare on topologcal queres, choose the approprate h(x ) and then estmate the costs. Ths analyss, however, s not complete because recall the model also has θ I s. Whle the emphass has been on largely obtanng h(x ) from x, t s possble that the parameters themselves readly lend to h(x ) and ths should be consdered when evaluatng a model. For example, dynamcty s reflected n the coeffcents, whle nformaton such as rate of change can be observed n the coeffcents of lnear regresson models. Note that equaton 4 smply focuses on the processng steps but s not concerned whether they nvolve parameters or x s. The complementary crteron to the model buldng descrbed n secton 3.1 s to then select the model wth the lowest cost for a desred query type or types. In summary, for each model under consderaton, t s recommended to frst use equatons 2, 3, and 4 to assess the computatonal cost and memory needs as the model s used aganst dfferent query types (f applcable). Note, that these equatons have some adjustable parameters, such as m and r, and these wll depend on the accuracy sought and the query beng asked. In general, the model that has the lowest cost n computaton and memory for a gven query s a strong vable canddate. Ths analyss has not consdered tree search costs snce ths cannot be determned from the model structure tself but may prove to be substantal for some models (to obtan a target accuracy) compared to others. The fnal assessment should also consder ths search cost as well once the ndex searchng protocol has been fnalzed. In short, a model that answers the query of nterest wth mnmal total computatonal cost and memory s a strong canddate to be consdered for AQSm. 4. Results Below s an llustraton for a smple (fnte length) model and a statstcal model usng the mean as the model usng the crtera dscussed n secton 3. The results have been broken nto model buldng and model-query nteracton and, for the latter, wth some selected query types of nterest and the memory usage Model buldng Modelng buldng ncurs the followng computatonal cost. stat 2*O(n) + O(n ln n) N*O(n) Here, the cost for the s ncludes the cost of scannng through the data (a conservatve hgh estmate s presented) and the need to cull the coeffcents to obtan a target error. The scannng consders the cost of the downsamplng when performng the decomposton. Note that, as the model s nherently multresolutonal, t can n theory just work on the parent node and acheve a hgh level of accuracy. As to the statstcs model, the large N refers to the total number of nodes n the ndex tree. To acheve a desred level of accuracy, the data wll be parttoned to the resoluton that s desred. If the data s wdely dstrbuted or hgh precson s requested, N can approach or exceed n n value. The reason s that f each leaf node s beleved to contan a sngle pont then mnmum N = n, but as ths s only the leaves and not the rest of the tree, N>n Model-query nteracton The followng three query cases provde an overvew of the computatonal cost the two models ncur when quered. Range query: stat_model O(n) + O(n) O(ln N) reconstruct/flter fnd relevant nput In the case of the, the extreme case s consdered where the entre data s reconstructed and then fltered upon to fnd the values of nterest. To smplfy the analyss, only flterng was consdered. (It s possble to have the reconstruct selected areas of the data, but ths would be covered by the bookkeepng and ts functon form s not clearly known for the purposes of ths exercse.) It may be possble to shorten ths to O(d) where d << n f a mappng can be acheved by lnkng the dstrbuton of the data values to partcular coeffcents. For the statstcs model, the cost s largely n searchng a tree of N nodes to fnd the model values that satsfy the query. Dynamcty query: stat_model O( ln C) O(n) + O(G(n)) Snce the models stores nformaton about the dynamcty (varable actvty) n the coeffcents, ths would smply be a search over these model parameters, C number of coeffcents. For the case of statstcal model, t would requre a reconstructon of the data followed by further processng of the data to defne the dynamcty functon. If somethng asde from standard devaton s desred, ths can 5

8 be costly. For example, G(n) = n 2 for consderng par-wse nteractons. Topologcal query: stat_model O(n) + ΣO(G(n)) m*o(ln N) + ΣO(G(n)) The prmary cost of the les n reconstructng the data and then buldng the necessary topologcal functon, desgnated by the summaton term. For the statstcs model, the cost les n fndng all the relevant nodes m and searchng through a tree of N nodes. Once the data s obtaned there s the ssue of buldng the topologcal functon just as n the case. In general the model s advantageous because of the ease by whch the data can be reconstructed to a hgh level of accuracy. For the statstcs model, there s the cost of searchng through the ndex tree but ts accuracy s unlkely to match that of the system unless a large ndex tree s created n whch case N (nodes) can be greater than n (number of samples n orgnal data). The two models use the followng memory space per varable. stat C + n 2* N For the case of the s, C represents the number of coeffcents stored for a gven level of accuracy and the n s to ndcate the sze of the ndex bookkeepng where n s the number of samples n the orgnal data. In a worse case scenaro ths would represent the mappng between unque data values and the coeffcents. If there are redundant values, t may be possble to reduce n consderably. Furthermore, f a common ndex scheme s used, then only one bookkeepng system may be requred as opposed to one per varable. Ths would reduce storage costs consderably. As for the statstcs model, only two parameters are needed the mean and standard devaton, hence the number 2. N refers to the number of nodes n the ndex tree. Agan f hgh precson s requested, N can reach or exceed n as the mean s ablty to accurately reflect the data ncreases as the partton sze becomes ncreasngly localzed. In summary, whle the actual performance does vary dependng on the data dstrbuton, t appears that f hgh accuracy s desred N>>n, then a model would be useful for most of the applcatons otherwse a statstcs model wll do. 5. Conclusons and Future Work The current mplementaton of AQSm only has a statstcal mean as a model but t s beng upgraded to ncorporate a model to provde addtonal functonalty. Addtonal work wll also be done n the areas of ndex constructon for hgh multdmensonal data, parttonng schemes beyond the spato-temporal bsecton currently n place, mappng of model metadata to query types, and a metrc to buld models selectve. The sgnfcance of the last pont s many modelng technques are computatonally ntensve to buld, for example, O(n 3 ) for regresson based systems. As there s no guarantee that for gven data dstrbuton the resultng model wll produce a hghly accurate model, a metrc s needed to gude when a model has a reasonable chance of successful reconstructon. If the model performance s gong to be poor, t wll be better to use the mean as a crude model owng to computatonal effcency. Other research areas n the context of model evaluaton nclude the followng: how amenable the models are to beng parallelzed, how automated the model constructon can be, how robust the model s to dealng wth dfferent data types, and whether the model can be constructed onlne. References Acharya, S, Gbbons, PB, Poosala, V, and Ramaswamy, S (1999), Jon Synopses for Approxmate Query Answerng, n Proceedngs of the ACM SIGMOD Internatonal Conf. on Management of Data (SIGMOD '99), June 1999 Acharya S., Gbbons PB, Poosala, V and Ramaswamy, S. (1999a), The Aqua Approxmate Query Answerng System, demo n ACM SIGMOD Internatonal Conf. on Management of Data (SIGMOD '99), June Chakrabart, K, Garofalaks, MN, Rastog, R, and Shm, K (2000), Approxmate Query Processng Usng Wavelets, Proceedngs of the Internatonal Conference on Very Large Databases, Caro, Egypt. Cormen,TH, Leserson CE, and Rvest, RL (1998), Introducton to Algorthms, MIT Press Hachem, N, Bao C, and Taylor, S (1995), Approxmate Query Answerng n Numercal Databases, techncal report, Department of Computer Scence, Worcester Polytechnc Insttute Ioannds, YE, and Poosala, V, (1999) Hstogram-Based Approxmaton of Set-Valued Query Answers, n Proceedngs of the 25 th Internatonal Conference on Very Large Data Bases Myung, IJ (2000), The Importance of Complexty n Model Selecton, Journal of Mathematcal Psychology, vol 44, p Vtter, JS, and Wang, M (1999), Approxmate Computaton of Multdmensonal Aggregates of Sparse Data usng Wavelets, n Proceedngs of the ACM SIGMOD Internatonal Conf. on Management of Data (SIGMOD '99), June 1999 Vtter, JS, and Wang, M (1998), Data Cube Approxmaton and Hstograms va Wavelets, n Proceedngs of the 7th Internatonal Conference on Informaton and Knowledge Management of Data 6

9 Unversty of Calforna Lawrence Lvermore Natonal Laboratory Techncal Informaton Department Lvermore, CA 94551

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently. Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW. SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW. Lucía Isabel García Cebrán Departamento de Economía y Dreccón de Empresas Unversdad de Zaragoza Gran Vía, 2 50.005 Zaragoza (Span) Phone: 976-76-10-00

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Allocating Time and Resources in Project Management Under Uncertainty

Allocating Time and Resources in Project Management Under Uncertainty Proceedngs of the 36th Hawa Internatonal Conference on System Scences - 23 Allocatng Tme and Resources n Project Management Under Uncertanty Mark A. Turnqust School of Cvl and Envronmental Eng. Cornell

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

IT09 - Identity Management Policy

IT09 - Identity Management Policy IT09 - Identty Management Polcy Introducton 1 The Unersty needs to manage dentty accounts for all users of the Unersty s electronc systems and ensure that users hae an approprate leel of access to these

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Sample Design in TIMSS and PIRLS

Sample Design in TIMSS and PIRLS Sample Desgn n TIMSS and PIRLS Introducton Marc Joncas Perre Foy TIMSS and PIRLS are desgned to provde vald and relable measurement of trends n student achevement n countres around the world, whle keepng

More information

Fuzzy Regression and the Term Structure of Interest Rates Revisited

Fuzzy Regression and the Term Structure of Interest Rates Revisited Fuzzy Regresson and the Term Structure of Interest Rates Revsted Arnold F. Shapro Penn State Unversty Smeal College of Busness, Unversty Park, PA 68, USA Phone: -84-865-396, Fax: -84-865-684, E-mal: afs@psu.edu

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA ) February 17, 2011 Andrew J. Hatnay ahatnay@kmlaw.ca Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ Effcent Strpng Technques for Varable Bt Rate Contnuous Meda Fle Servers æ Prashant J. Shenoy Harrck M. Vn Department of Computer Scence, Department of Computer Scences, Unversty of Massachusetts at Amherst

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center Dynamc Resource Allocaton and Power Management n Vrtualzed Data Centers Rahul Urgaonkar, Ulas C. Kozat, Ken Igarash, Mchael J. Neely urgaonka@usc.edu, {kozat, garash}@docomolabs-usa.com, mjneely@usc.edu

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Time Value of Money Module

Time Value of Money Module Tme Value of Money Module O BJECTIVES After readng ths Module, you wll be able to: Understand smple nterest and compound nterest. 2 Compute and use the future value of a sngle sum. 3 Compute and use the

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

Demographic and Health Surveys Methodology

Demographic and Health Surveys Methodology samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information