A TwoStep SelfEvaluation Algorithm On Imputation Approaches For Missing Categorical Data


 CSCJournals AICoordinator
 1 years ago
 Views:
Transcription
1 A TwoStep SelfEvaluaton Algorthm On Imputaton Approaches For Mssng Categorcal Data Lukun Zheng Faculty of Department of Mathematcs Tennessee Technologcal Unversty Cookevlle, TN 385, Unted States of Amerca Abstract Mssng data are often encountered n data sets and a common problem for researchers n dfferent felds of research. There are many reasons why observatons may have mssng values. For nstance, some respondents may not report some of the tems for some reason. The exstence of mssng data brngs dffcultes to the conduct of statstcal analyses, especally when there s a large fracton of data whch are mssng. Many methods have been developed for dealng wth mssng data, numerc or categorcal. The performances of mputaton methods on mssng data are key n choosng whch mputaton method to use. They are usually evaluated on how the mssng data method performs for nference about target parameters based on a statstcal model. One mportant parameter s the expected mputaton accuracy rate, whch, however, reles heavly on the assumptons of mssng data type and the mputaton methods. For nstance, t may requre that the mssng data s mssng completely at random. The goal of the current study was to develop a twostep algorthm to evaluate the performances of mputaton methods for mssng categorcal data. The evaluaton s based on the remputaton accuracy rate (RIAR) ntroduced n the current work. A smulaton study based on real data s conducted to demonstrate how the evaluaton algorthm works. Keywords: Categorcal Varable, Imputaton Methods, Mssng Value, ReImputaton Accuracy Rate.. INTRODUCTION AND MOTIVATION Data scentsts are often faced wth ssues of mssng data n dfferent stuatons [, 2]. There are several reasons why observatons may have mssng values. For nstance, one reason may be that some respondents just do not report some of the varable for some reason. Researchers have nvestgated the mpact of mssng data on statstcal analyses [3]. The mssng data may lead to based estmates and nflated standard errors. In addton, the power of statstcal tests may drop dramatcally when there s a large amount of mssng data n the data set. Data often are mssng n research n economcs, socology, and poltcal scence because governments choose not to, or fal to, report crtcal statstcs [4]. Sometmes mssng values are caused by the researcher. For example, when data collecton s done mproperly or mstakes are made n data entry [5]. There are dfferent types of mssng data based on the underlyng formaton process. Dfferent forms of mssng data have dfferent mpacts on the statstcal analyses. There are a lot of open resources about mssng data and nterested readers are encouraged to refer to them for a more thorough dscusson. Values n a data set are mssng completely at random (MCAR) f the events that lead to any datatem beng mssng are ndependent both of observable varables and of unobservable parameters of nterest, and occur entrely at random [6]. When data are MCAR, the mssng data are a random sample of the observed data and the analyss performed on the data s unbased. Mssng at random (MAR) occurs when the mssngness s not random, but where mssngness can be fully accounted for by varables where there s complete nformaton. For nstance, males are less lkely to enroll n a depresson survey than females but ths has nothng to do wth ther level of depresson. Fnally, Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28
2 mssng not at random (MNAR) (also known as nongnorable nonresponse) occurs when the mssngness s related to the value of the varable tself. To extend the prevous example, ths would occur f men faled to enroll n a depresson survey because of ther level of depresson. As another example, respondents mght be asked to ndcate the number of tmes they are pulled over due to speedng durng the prevous year. A mssng response would be MNAR f ndvduals who were pulled over for many tmes due to speedng durng ths perod were more lkely to leave the tem unanswered rather than report ther behavor. Gven the potental problems caused by the presence of mssng data, many methods have been suggested for data mputaton. People may refer to [7] for a comprehensve revew of many of these methods. Many mssng data technques have been suggested and developed. Accordng to [8], these mssng data technques can be roughly grouped nto:. technques gnorng ncomplete observatons, 2. mputatonbased technques, 3. weghtng technques, and 4. modelbased technques [2]. In ths paper, we wll gnore weghtng technques and manly focus on the mputatonbased technques and modelbased technques. The smple and wdely used technque s the socalled lst wse deleton whch gnore all ncomplete observatons and only focus on complete observatons. Lst wse deleton s easy to carry out and the result may be relable and satsfactory wth small porton of observatons wth mssng data. However, t fals f there s a large porton of observatons wth mssng data, whch mght be very common n hghdmensonal case. In addton, lst wse deleton may lead to serous bases f the data are mssng not at random. Imputatonbased technques provde a way to replace mssng values wth sutable estmates, whch results n mputed complete data set. Many methods have been suggested for mputng sutable responses to mssng data. Among these technques are mean or mode substtuton, regresson mputaton, knearest neghbor mputaton, Hot Deck mputaton, multple mputaton, etc. In the mputatonbased technques, mssng values are replaced by artfcal values and t may cause seres bases. And ths mputed data set n turn mght lead to based result n the subsequent data analyss. And most of these mputaton methods has been found nadequate n reproducng known populaton parameters and standard errors [7]. Modelbased mputaton technques performs parameter estmaton. Two popular modelbased mputaton technques are regressonbased and lkelhoodbased technques [2, 9]. In regressonbased mputaton, the mssng values are mputed by a regresson of the dependent varable wth mssng values on the observed values of other ndependent varables for a gven data set. In lkelhoodbased mputaton, the data are descrbed based on a model and the parameters are estmated by maxmzng lkelhood for a gven data set [2]. The mputaton methods can also be roughly classfed nto two: parametrc mputaton methods and nonparametrc mputaton methods. The parametrc mputaton methods are usually superor f the data set can be modeled adequately by a parametrc model. For nstance, the lnear regresson can be used to conduct the mputaton of mssng quanttatve values and logstc regresson can be used to conduct mputaton of mssng bnary qualtatve values [, ]. Nonparametrc mputaton methods conduct mputaton of mssng data by capturng structures n the data sets and they offer an alternatve f the users have no dea of the actual dstrbuton of the data set. In other words, t s benefcal to use a nonparametrc method n cases that the relatonshp between the response and explanatory varables are unknown. For nstance, the nearestneghbor (NN) mputaton algorthm s one of the nonparametrc methods used for mputaton of mssng data n sample surveys [2]. Chen and Huang [3] constructed a genetc system to mpute n relatonal database systems. The machne learnng methods also nclude decson tree mputaton, auto assocatve neural network and so forth. Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 2
3 Once the mputaton of mssng data s done, t s crucal to evaluate the performance of the mputaton technques used through determnng the effect of mputaton on subsequent statstcal nferences. All these mputaton methods have pros and cons and they may perform well n one or more stuatons but fals n others. And the comparson among these dfferent mputaton methods have been studed n many stuatons. And these comparsons are manly based on the potental consequences n the subsequent data analyss caused by correspondng mputaton methods. For nstance, [] compares the performance of three mputaton methods based on the estmaton accuracy of logstc regresson parameters, standard errors and hypothess testng results from the mputed data usng rounded multple mputaton for contnuous data(mi), stochastc regresson mputaton (SRI), and multple mputaton for categorcal data, respectvely. [4] studed the performance of three mssng value mputaton technques wth respect to dfferent rate of mssng values n the data set. The purpose of ths paper s to present a new selfevaluaton algorthm of mputaton approaches for mssng categorcal data values. The new evaluaton algorthm can be used to test a wde range of mputaton technques of mssng categorcal data. The performance of several leadng mputaton approaches s measured wth respect to dfferent rate of mssng values n the data set. To ths end, the paper provdes:. a descrpton of several popular and modern mputaton approaches, namely, MI, KNN, C5. decson tree, Nave Bayes, and polytomous regresson mputaton ordered (POLR). 2. a new selfevaluaton algorthm for mputaton approaches of mssng categorcal data. 3. a wde range of evaluaton of qualty of mputaton wth respect to the rate of mssng data. 4. several smulaton results based on real data. The paper s organzed as follows. Secton 2 provdes a descrpton of the relevant mputaton approaches and our proposed selfevaluaton algorthm. Secton 3 explans detals of the expermental study and presents and analyzes the results. Fnally, Secton 4 summarzes the paper. 2. IMPUTATION APPROACHES AND THE PROPOSED SELFEVALUATION ALGORITHM In ths secton, we wll ntroduce the mputaton approaches used n ths study and the proposed selfevaluaton algorthm. 2. Imputaton Approaches The mputaton technques used n ths paper nclude mean mputaton (MI), k nearest neghbor mputaton (KNN), C5. decson tree mputaton, Nave Bayes (NB), and polytomous regresson mputaton ordered (POLR).. Mean Imputaton (MI). In mean mputaton, the mssng values are mputed wth the mean for quanttatve data or the most frequent value (mode) for qualtatve data of the correspondng the varable. Anderson et al [5] states that the sample mean provdes an optmal estmate of the most probable value n the case of normal dstrbuton. The use of MI wll shrnk the sample varance and affect the correlaton between the mputed varable and other varables. Such mpact wll be sgnfcant f there s a hgh percentage of mssng values mputed usng MI and the subsequent statstcal nferences mght be msleadng due to the too much centrally located values created by MI [6]. 2. K Nearest Neghbor Imputaton (KNN). KNN s an effcent hot deck method to mpute the mssng value, n whch the mssng values are mputed based on ts k nearest neghbors n the whole data set accordng to some metrc. Ths method apples to both contnuous varables or dscrete varables. For contnuous varables, the most commonly used dstance metrc to determne the k nearest neghbors s the Eucldean dstance (Mnkowsk norm Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 3
4 / p n p d( x, y) x y wth p = 2). Then the mssng values s mputed as the average or weghted average of these k nearest neghbors. For dscrete varables, such as texts, other types of dstance metrc such as Hammng dstance or overlap metrc can be used to determne the k nearest neghbors. Then the mssng value s mputed as the most frequent value among the k nearest neghbors. Usually k s taken as a small postve nteger. When k =, the KNN method s the smlar response pattern mputaton (SRPI), whch conssts of dentfyng the nearest neghbor (the most smlar observaton) and mputng the mssng value by copyng the value of ths nearest neghbor. An advantage over mean mputaton s that the replacement values are nfluenced only by the most smlar cases rather than by all cases. Several studes have found that the knn method performs well or better than other methods n certan contexts [7, 8]. A shortcomng of the KNN mputaton s that t s senstve to the local structure of the data. 3. C5. Decson Tree Imputaton (C5.) [9]. C5. extends the C4.5 classfcaton algorthms descrbed n [2]. It s a wellknown machne learnng algorthm whch has a good nternal method to treat mssng values. Hurley (27) dd a comparatve study, wth other smple methods to treat mssng values, and concluded that t was one of the best methods. C5. uses the nformaton gan rato measure to choose a good test on one categorcal varable that has mutually exclusve outcomes O, O2,..., Or for a gven tranng data set T. Then T s parttoned nto T, T2,..., Tr wth T consstng of the observatons n T that wll be classfed as O by the test. The same algorthm s then appled to each subset T untl a stop crteron s encountered. When an nstance n T wth known value s assgned to a subset T, ths ndcates that the probablty of that nstance belongng to subset T s and to all other subsets s. When the value s mssng, C5. assocates to each nstance n T a weght representng the probablty of that nstance belongng to T. Ths probablty s estmated as the sum of the weghts of nstances n T known to satsfy the test wth outcome O, dvded by the sum of weghts of the cases n T wth known values on the varable. 4. NaveBayes (NB). Nave Bayes s an also a machne learnng technque [2]. The algorthm works wth dscrete data and requres only one pass through the database to generate a classfcaton model, whch makes t computatonally effcent. Ths algorthm assumes that the feature or attrbute values are condtonally ndependent gven the class of the attrbute, d P( x, x2,..., xd c) P( x c), where x s the th attrbute, c represents the class, and d s the number of attrbutes. The data are dvded nto two parts: ) tranng database that ncludes all records for whch class attrbute s complete and 2) testng database for whch the records are mssng. Imputaton based on the Nave Bayes conssts of two smple steps. Frst, the condtonal probabltes P( x c) and the pror probabltes P(c) are estmated based on the tranng database. Then the estmated probabltes are then used to conduct mputaton of the mssng values of the attrbutes n testng database based on the rule: arg max c P ( c x, x,..., x ) arg max P( c). 2 d c d P( x c). 5. Polytomous Regresson Imputaton Ordered (POLR). Ths s a regresson mputaton method based on proportonal odds model. The proportonal odds model s estmated frst based on avalable complete or ncomplete observatons and then used to mpute mssng values of a varable based on the ftted values from the model. Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 4
5 Imputaton Approaches Works wth contnuous Works wth dscrete data? data? MI Yes Yes KNN Yes Yes C5. No Yes NaïveBayes No Yes POLR Yes Yes TABLE : Summary of Imputaton Approaches. Table summarzed the mputaton approaches used n ths study. 2.2 The SelfEvaluaton Algorthm The evaluaton algorthm proposed n the paper conssts of two steps: mputaton and remputaton. Gven a data set, let Y be a categorcal varable wth mssng values n the data set T and let X X, X,..., ) corresponds all other varables. Assume that there are n ( 2 X d ( x, x2,..., xd ; y observatons o ),,2,..., n n the data set. We may assume that there are no mssng values for other varables n the data set. Splt the data set nto two sets: one set S wth observatons wth no mssng values and the other set S of observatons wth mssng values. Assume that there are n observatons n S and n observatons n S. Therefore, n s n the number of mssng values n the data set. And r s the proporton of mssng values n Y. We select one mputaton technque to fll n the mssng value. To evaluate the performance of the selected mputaton technque, we propose the followng selfevaluaton algorthm. Step (Imputaton): Conduct the mputaton of mssng values n S usng the selected mputaton technque to obtan an mputed complete data set. Step 2 (Remputaton): From the mputed complete data set, delete certan number g of values of Y from the data set S S usng approprate method to obtan a new ncomplete data set. Then rempute the mssng values usng the same mputaton method n Step. Here the method to deleted values of Y must be chosen to keep the type of mssng values the same as those n the orgnal data set. For nstance, f the mssng value n the orgnal data set s mssng completely at random, then we must delete the values of Y n S completely at random. n Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 5
6 The man procedures n the selfevaluaton algorthm are shown n Fgure. FIGURE : SelfEvaluaton Algorthm (? denotes mssng values). In the remputaton step, we have the real values for those mssng snce they are made mssng from the data set S. Therefore, the mputaton accuracy can be obtaned. In fact, let c c c, 2,... m be the possble values of Y. For each observaton c be the real value of Y n the observaton j j o j, j,..., g wth value of Y deleted n o and let c j be the mputed value after the remputaton step. Therefore, the number of correctly mputed mssng values s S, let T g j I( c j c j ) () where I( c c, f c j c j ), Otherwse s the ndcator functon. The remputaton accuracy rate (RIAR) s j j T RIAR g (2) where g s the total number of observatons n S wth correspondng Y values deleted and later remputed. We may repeat these two steps for K tmes ndependently, and, as a result, we obtan correspondngly K assocated mputaton accuracy rates RIAR, k,..., K. Then the average Imputaton accuracy rate s obtaned as: RIAR K k RIAR Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 6 K k K k k T gk k (3)
7 where T k s the number of correctly mputed mssng values n the kth repetton. 2.3 Estmaton and Comparson Due to the law of large numbers, the average mputaton accuracy rate tends to be the expected mputaton accuracy rate under certan assumptons. In other words, as we do suffcent number K of repettons of the algorthm, the average remputaton accuracy rate wll approach the true remputaton accuracy rate and can, therefore, can be used to measure the performance of the mputaton technque. Theorem. Gven a data set wth mssng categorcal values and an mputaton technque, let RIAR be the average remputaton accuracy rate, then t converges to the expected remputaton accuracy rate E(RIAR ) n probablty. That s, RIAR P E( RIAR ), as K. (4) The followng theorem s a drect result from central lmt theorem on proportons. Theorem 2. Gven a data set wth mssng categorcal values and an mputaton technque, let RIAR be the average remputaton accuracy rate, then RIAR E( RIAR ) L N(,), RIAR as K. (5) Where E( RIAR )[ E( RIAR )]/ K. RIAR From Theorems, 2, and Slutsky's theorem, we have the followng corollary. Corollary. Gven a data set wth mssng categorcal values and an mputaton technque, let RIAR be the average remputaton accuracy rate, then RIAR E( RIAR ) RIAR ( RIAR ) / K L N(,), as K. (6) These theorems and corollary provde statstcal tools for estmaton and comparson of the remputaton accuracy rates based on dfferent mputaton approaches on a data set wth mssng categorcal data. The expected mputaton accuracy rate E(IAR) s an mportant measure of the performance of a gven mputaton technque. However, there are many cases n practce that no real values of the mssng value are known, whch makes t mpossble to obtan the mputaton accuracy rate. In the remputaton step of the proposed algorthm, t becomes possble to obtan the remputaton accuracy rate snce the mssng values of Y are made mssng n the data set S and we do have the correspondng real values. The remputaton accuracy rate can be obtaned to measure the performance of the mputaton technque. The fact that the same mputaton technque s used to evaluate the performance of tself leads to the name of the proposed selfevaluaton algorthm for mputaton approaches". The proposed algorthm s applcable to a wde range of mputaton methods. There are several evaluaton algorthms whch have been proved to be effectve n many stuatons for comparng dfferent mputaton methods [, 8]. However, they usually assume strong assumptons on the type of varables and statstcal models. Our algorthm s based on the Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 7
8 expected remputaton accuracy rate, whch assumes no assumptons on the mssng values n the dataset and models on evaluatons. Hence, t s vald to perform evaluatons on a wde range of mputaton methods n dfferent stuatons. In addton, the proposed algorthm s easer to perform evaluatons compared wth many other algorthms. Imputaton methods are usually evaluated on how they perform for nference about target parameters based on a statstcal model such as a regresson model. Sometmes, these statstcal models are complcated and come wth strong assumptons, whch makes the evaluatons hard to perform and restrct the applcatons to some lmted stuatons. In our proposed algorthm, the performance can be easly obtaned n a twostep procedure based on a smple valuaton measure E(RIAR) and very basc assumptons. In the next secton, we wll use the proposed evaluaton algorthm to measure the performance of the fve mputaton approaches mentoned n secton 2. based on a real data set. 3. EXPERIMENT AND RESULTS The man purpose of the experments s to emprcally evaluate the performances of mputaton approaches based on ther remputaton accuracy rates usng the proposed selfevaluaton algorthm. We wll start wth the descrpton of the data sets. The expermental results and analyss wll follow. Varables Categores Frequency Relatve Frequency Rank (Ordnal). Assstant Professor Assocate Professor Full Professor Mssng NA Dscplne (Nomnal) yrs.servce (Ordnal) Salary (Ordnal). Theoretcal (A) Appled (B) Mssng NA. Less than From 6 to More than Mssng 5 NA. Less than $85, From $85, to $, 3. More than $, Mssng 22 NA TABLE 2: Descrptve Summary of Varables In The Data Set. 3. Estmaton and Comparson The experments are based on the data set Salares" ncluded n the R package car [22]. The data set ncluded the 289 nnemonth academc salares for assstant professors, assocate professors and full professors n a college n the U.S. After some ntal data processng, the resultng data set contaned nformaton of 67 assstant professors, 64 assocate professors, and 94 full professors on four varables, namely, rank, dscplne, years of servce (yrs.servce), and salary. The varable rank has three dfferent levels: assstant professor (AsstProf), assocate professor (AssocProf), and full professor (Prof). The varable dscplne has two levels: theoretcal department (A) and appled department (B). The varable years of servce (yrs.servce) s classfed nto three categores wth level f t s less than sx years, level 2 f t s from sx years to ffteen years, and level 3 f t s more than ffteen years. Smlarly, the varable salary s also classfed nto three categores wth level f t s less than $ 85,, level 2 f t s more than or equal to $ 85, and less than $,, and level 3 f t s more than $,. There are ffteen mssng values for the varable years of servce (yrs.servce) and twenty two Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 8
9 mssng values for the varable salary. A descrptve summary of the resultng data set s gven n table Expermental Setup The experments were performed n the followng procedure. The data set was frst mputed usng a selected mputaton method. Next, the mssng data were generated randomly, usng the MCAR mechansm, from the data set S(the set contanng complete observatons) n the followng amounts: 5%, %, 2%, and 3%. Then the mssng data were agan mputed usng the same mputaton method and the remputaton accuracy rate (RIAR) of the selected mputaton method was calculated and evaluated through estmaton. The mputaton methods used ncludes mean mputaton (MI), KNN, C5. decson tree mputaton, NaveBayes(NB), and polytomous regresson mputatonordered (POLR). The results of the above experments were then examned and compared. 3.3 Expermental Results The experments report the remputaton accuracy rate aganst four amounts of mssng values, for fve dfferent mputaton methods mentoned above. For KNN, three neghbors are consdered. In these experment, the accuracy rate s measured based on a zeroone loss, whch s commonly used to evaluate the performance of mputaton methods. We assume a unform cost for all the categores of all varables. A future study would consder dfferent categores havng dfferent assocated costs. Table 3 presents the average remputaton accuracy rate of these fve dfferent mputaton methods at dfferent amounts of mssng values based on K = repettons. The best results based on these average RIAR's are hghlghted n bold. We note that the hghest remputaton accuracy rate over all amounts of mssng data s 7.9% for 5% mssng data remputed by the C5 decson tree mputaton for the varable yrs.servce. The hghest remputaton accuracy rate over all amounts of mssng data s 8.6% for % mssng data remputed by the C5 decson tree mputaton for the varable salary. Based on these results, we notce that C5 s n general the best mputaton method among these fve mputaton methods for the mssng values n the data set. Varable yrs.servce salary Mssng Imputaton Methods (%) MI KNN C5 Naïve Bayes POLR TABLE 3: Average remputaton accuracy rates for two varables, fve mputaton methods and four amounts of mssng data based on repettons (Best average RIAR's are shown n bold). 3.4 Statstcal Comparson Now we use test of sgnfcance to compare the performance of the mputaton methods based on the results of remputaton accuracy rates. We analyze the statstcal sgnfcance of dfferences n remputaton accuracy rates between mputaton methods at 95% level of sgnfcance. We want to test whether two mean remputaton accuracy rates between two dfferent mputaton methods for a gven amount of mssng data are dfferent. The null hypothess H s that they are the same. Snce the average remputaton accuracy rates are computed based on K = Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 9
10 ndependent repettons, a Ztest s appled for the comparson. The standard error of the samplng dstrbuton of the accuracy rate dfference s SE p( p) n n 2 2p( p) K, where n ˆ ˆ ˆ ˆ p n2 p2 p p2 p n n 2 2 s the pooled sample accuracy rate ( n n 2 K ). Here ˆp and ˆp 2 denotes the sample remputaton accuracy rates of two selected mputaton methods, respectvely. For llustraton, we test the sgnfcance of dfferences n remputaton accuracy rates usng KNN and all other methods at dfferent amounts of mssng data. The results are summarzed n table 4. Varable yrs.servce salary Mssng (%) MI Sgnfcance (zscore) Imputaton Methods C5 Sgnfcance (zscore) Naïve Bayes Sgnfcance (zscore) POLR Sgnfcance (zscore) 5 (.77) + + (4.2) (.59) + + (3.4) (:65) + + (5:) + + (3:8) + + (4:8) 2 (9:) + + (6:46) + + (4:55) + + (5:63) 3 (7:78) + + (7:82) + + (6:) + + (6:83) 5 (:4) + + (8:44) + + (8:) (2:39) (9.3) + + (9:73) + + (9:45) (:3) 2 (7:23) + + (:38) + + (:39) (:87) 3 (5:63) + + (2:77) + + (2:78) + + (2:74) TABLE 4: Statstcal sgnfcance of the dfference of remputaton accuracy rates usng KNN and other methods at dfferent amounts of mssng data. Here, ++" ndcates that the gven mputaton method (columns) gves statstcally sgnfcantly better remputaton accuracy rate than KNN; " ndcates that the gven mputaton method (columns) gves statstcally sgnfcantly worse remputaton accuracy rate than KNN; " ndcates that the dfference between the accuracy rates usng the gven mputaton methods and KNN s nsgnfcant. 4. CONCLUSION In ths paper, we have ntroduced a new evaluaton algorthm for mputaton methods of mssng categorcal values. There are two man steps n the algorthm. In the frst step, the ncomplete data set s mputed usng a selected mputaton method to obtan an mputed complete data set frst. In the second step, a certan amount of mssng values are generated from the orgnally exstng observatons n the mputed complete data set and then these mssng values get remputed. The remputaton accuracy rate s obtaned based on certan number of repettons of ths process and used to evaluate the performance of selected mputaton method. From Table 3 and 4, we can see that, among the selected mputaton methods, C5 reconstructed the mssng data n a better manner n general. The Naïve Bayes methods was comparable to C5 only for the varable salary when the amount of mssng values s 2 percent or 3 percent. The mean mputaton method produced worse results compared to other methods. Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28
11 The algorthm assumes no assumptons on the mssng values. That s, t appled to any type of mssng data (MCAR, MAR, or mssng not at random). In addton, researchers can evaluate most of mputaton methods avalable for categorcal data usng our algorthm on a specfc gven data set. The am of the algorthm s to twofold. Frst, t ntroduces an evaluaton algorthm of mputaton methods for categorcal data n a data set and, therefore, provdes the best" mputaton method for the mssng categorcal data. Second, t can shed some nsghts on the true reason of mssng values n the data set based on the performance of dfferent mputaton methods. In ths paper, we only studed the evaluaton algorthm for mssng categorcal data. The evaluaton algorthm for mssng contnuous data needs to be studed n future works. 5. REFERENCES [] W.H. Fnch. Imputaton methods for mssng categorcal questonnare data: a comparson of approaches. Journal of Data Scence, vol. 8, pp , 2. [2] R.J.A. Lttle. and D.B. Rubn. Statstcal Analyss wth Mssng Data. New York: Wley, 987. [3] E. D. de Leeuw, J. Hox, and M. Husman. Preventon and treatment of tem nonresponse. Journal of Offcal Statstcs, vol. 9, pp , 23. [4] S. F. Messner. Explorng the Consequences of Erratc Data Reportng for Cross Natonal Research on Homcde. Journal of Quanttatve Crmnology, vol. 8, pp.5573, 992. [5] D. J. Hand, H. J. Adér, and G. J. Mellenbergh. Advsng on Research Methods: A Consultant's Companon. Huzen, Netherlands: Johannes van Kessel. pp , 28. [6] J. L. Schafer. Analyss of Incomplete Multvarate Data. Chapman and Hall, 997. [7] J. L. Schafer and J. W. Graham. Mssng data: Our vew of the state of the art. Psychologcal Methods, vol. 7, pp.4777, 22. [8] I. Myrtvert and E. Stensrud. Analyzng data sets wth mssng data: an emprcal evaluaton of mputaton methods and lkelhoodbased methods. IEEE Transactons On Software Engneerng, vol. 27, pp.9993, 2. [9] D.B. Rubn. Multple mputaton after 8+ years. J. Am. Stat. Assoc, vol. 9, pp , 996. [] S.C. Zhang, et al. Optmzed parameters for mssng data mputaton. PRICAI, vol. 6, pp. 6, 26. [] Q. Wang and J. Rao, Emprcal lkelhoodbased nferences n lnear models wth mssng data. Scand. J. Statst, vol. 29, pp , 22. [2] J. Chen and J. Shao. Jackknfe varance estmaton for nearestneghbor mputaton. J. Amer. Statst, Assoc, vol. 96, pp , 2. [3] S.M. Chen and C.M. Huang. Generatng weghted fuzzy rules from relatonal database systems for estmatng null values usng genetc algorthms. IEEE Transactons on Fuzzy Systems, vol., pp , 23. [4] R.S. Somasundaram and R. Nedunchezhan. Evaluaton of three smple mputaton methods for enhancng preprocessng of data wth mssng values. Internatonal Journal of Computer Applcatons, vol. 2, pp. 49, 2. Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28
12 [5] A.B. Anderson, A. Baslevsky, and D.P.J. Hum. Mssng data: a revew of the lterature, n Handbook of Survey Research. New York: Academc Press, 983, pp [6] M.J. Rovne and M. Delaney. Mssng data estmaton n developmental research, n Statstcal Methods n Longtudnal Research: Prncples and Structurng Change, A. Von Eye ed., New York: Academc Press, pp [7] O. Troyanskaya, M. Cantor, and G. Sherlock. Mssng value estmaton methods for DNA mcroarrays. Bonformatcs, vol. 7, pp , 2. [8] J. Chen and J. Shao. Nearest neghbor mputaton for survey data. Journal of Offcal Statstcs, vol. 6, pp. 33, 2. [9] L. Hurley. Mssng covarates n causal nference matchng: Statstcal mputaton usng machne learnng and evolutonary search algorthms. Doctoral dssertaton, Fordham Unversty, 27. [2] J. R. Qunlan. C4.5: Programs for machne learnng, Morgan Kaufman, Los Altos, CA, 993. [2] R.O. Duda and P.E. Hart. Pattern Classfcaton and Scene Analyss, New York: Wley, 973. [22] J. Fox, S. Wesberg, D. Adler, D. Bates, G. BaudBovy, S. Ellson,... and R. Heberger. Package car, Companon to Appled Regresson. R Package verson, 2, 26. Internatonal Journal of Scentfc and Statstcal Computng (IJSSC), Volume (7) : Issue () : 28 2
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? ChuShu L Department of Internatonal Busness, Asa Unversty, Tawan ShengChang
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationPSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 12
14 The Chsquared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationThe Development of Web Log Mining Based on ImproveKMeans Clustering Analysis
The Development of Web Log Mnng Based on ImproveKMeans Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract  Stock market s one of the most complcated systems
More informationCalculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a twostage stratfed cluster desgn. 1 The frst stage conssted of a sample
More informationStatistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
More informationStudy on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul  PUCRS Av. Ipranga,
More information1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationCHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
More informationCHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
More informationNPAR TESTS. OneSample ChiSquare Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
More informationAn Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsnyng Wu b a Professor (Management Scence), Natonal Chao
More informationSTATISTICAL DATA ANALYSIS IN EXCEL
Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 1401013 petr.nazarov@crpsante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for
More informationLatent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
More informationEditing and Imputing Administrative Tax Return Data. Charlotte Gaughan Office for National Statistics UK
Edtng and Imputng Admnstratve Tax Return Data Charlotte Gaughan Offce for Natonal Statstcs UK Overvew Introducton Lmtatons Data Lnkng Data Cleanng Imputaton Methods Concluson and Future Work Introducton
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMISP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationOn the Optimal Control of a Cascade of HydroElectric Power Stations
On the Optmal Control of a Cascade of HydroElectrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationBayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
More informationEvaluating credit risk models: A critique and a new proposal
Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):18841889 Research Artcle ISSN : 09757384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationA Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy Scurve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy Scurve Regresson ChengWu Chen, Morrs H. L. Wang and TngYa Hseh Department of Cvl Engneerng, Natonal Central Unversty,
More informationAn InterestOriented Network Evolution Mechanism for Online Communities
An InterestOrented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More information8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
More informationAn Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems
STANCS73355 I SUSE73013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part
More informationGender differences in revealed risk taking: evidence from mutual fund investors
Economcs Letters 76 (2002) 151 158 www.elsever.com/ locate/ econbase Gender dfferences n revealed rsk takng: evdence from mutual fund nvestors a b c, * Peggy D. Dwyer, James H. Glkeson, John A. Lst a Unversty
More informationSIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationTHE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
More informationNEUROFUZZY INFERENCE SYSTEM FOR ECOMMERCE WEBSITE EVALUATION
NEUROFUZZY INFERENE SYSTEM FOR EOMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
More informationThe Current Employment Statistics (CES) survey,
Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probabltybased sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,
More informationThe Application of Fractional Brownian Motion in Option Pricing
Vol. 0, No. (05), pp. 738 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qngxn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com
More informationCausal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes causeandeffect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
More informationProject Networks With MixedTime Constraints
Project Networs Wth MxedTme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More informationExhaustive Regression. An Exploration of RegressionBased Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of RegressonBased Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
More information1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 /  Communcaton Networks II (Görg) SS20  www.comnets.unbremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
More informationInequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
More informationTransition Matrix Models of Consumer Credit Ratings
Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss
More informationCourse outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
More informationA DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (5357) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng BüyükbakkalköyIstanbul
More informationHow Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
More informationDocument Clustering Analysis Based on Hybrid PSO+Kmeans Algorithm
Document Clusterng Analyss Based on Hybrd PSO+Kmeans Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,
More informationFREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EKMUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
More informationRiskbased Fatigue Estimate of Deep Water Risers  Course Project for EM388F: Fracture Mechanics, Spring 2008
Rskbased Fatgue Estmate of Deep Water Rsers  Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
More informationL10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
More informationA DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATIONBASED OPTIMIZATION. Michael E. Kuhl Radhamés A. TolentinoPeña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATIONBASED OPTIMIZATION
More informationRESEARCH ON DUALSHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.
ICSV4 Carns Australa 9 July, 007 RESEARCH ON DUALSHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract
More informationSPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
More informationSearching for Interacting Features for Spam Filtering
Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, YunChao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software
More informationCHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
More informationTHE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
More informationTraditional versus Online Courses, Efforts, and Learning Performance
Tradtonal versus Onlne Courses, Efforts, and Learnng Performance KuangCheng Tseng, Department of Internatonal Trade, ChungYuan Chrstan Unversty, Tawan ShanYng Chu, Department of Internatonal Trade,
More informationNONCONSTANT SUM REDANDBLACK GAMES WITH BETDEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia
To appear n Journal o Appled Probablty June 2007 OCOSTAT SUM REDADBLACK GAMES WITH BETDEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate
More informationFragility Based Rehabilitation Decision Analysis
.171. Fraglty Based Rehabltaton Decson Analyss Cagdas Kafal Graduate Student, School of Cvl and Envronmental Engneerng, Cornell Unversty Research Supervsor: rcea Grgoru, Professor Summary A method s presented
More informationIMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
More informationHOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
More informationEnterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
More informationPRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny CohenZada Department of Economcs, Benuron Unversty, BeerSheva 84105, Israel Wllam Sander Department of Economcs, DePaul
More informationMAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPPATBDClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
More informationChapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract
Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we
More informationInfluence and Correlation in Social Networks
Influence and Correlaton n Socal Networks Ars Anagnostopoulos Rav Kumar Mohammad Mahdan Yahoo! Research 701 Frst Ave. Sunnyvale, CA 94089. {ars,ravkumar,mahdan}@yahoonc.com ABSTRACT In many onlne socal
More informationMethodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
More informationMETHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
More informationPrediction of Disability Frequencies in Life Insurance
Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but
More information1.1 The University may award Higher Doctorate degrees as specified from timetotime in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
More informationSketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the
More informationPrediction of Disability Frequencies in Life Insurance
1 Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng 1, Fran Weber 1, Maro V. Wüthrch 2 Abstract: For the predcton of dsablty frequences, not only the observed, but also the ncurred but not yet
More informationSurvival analysis methods in Insurance Applications in car insurance contracts
Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 JeanMare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on
More informationFinancial Instability and Life Insurance Demand + Mahito Okura *
Fnancal Instablty and Lfe Insurance Demand + Mahto Okura * Norhro Kasuga ** Abstract Ths paper estmates prvate lfe nsurance and Kampo demand functons usng householdlevel data provded by the Postal Servces
More informationNumber of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000
Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from
More informationv a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
More informationPredicting Software Development Project Outcomes *
Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104
More informationScale Dependence of Overconfidence in Stock Market Volatility Forecasts
Scale Dependence of Overconfdence n Stoc Maret Volatlty Forecasts Marus Glaser, Thomas Langer, Jens Reynders, Martn Weber* June 7, 007 Abstract In ths study, we analyze whether volatlty forecasts (judgmental
More informationA Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
More information1 De nitions and Censoring
De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence
More informationThe impact of hard discount control mechanism on the discount volatility of UK closedend funds
Investment Management and Fnancal Innovatons, Volume 10, Issue 3, 2013 Ahmed F. Salhn (Egypt) The mpact of hard dscount control mechansm on the dscount volatlty of UK closedend funds Abstract The mpact
More informationAPPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedocho
More informationStaff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 200502 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 148537801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
More informationLearning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing
Internatonal Journal of Machne Learnng and Computng, Vol. 4, No. 3, June 04 Learnng from Large Dstrbuted Data: A Scalng Down Samplng Scheme for Effcent Data Processng Che Ngufor and Janusz Wojtusak part
More informationData Broadcast on a MultiSystem Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819840 (2008) Data Broadcast on a MultSystem Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
More informationDecision Tree Model for Count Data
Proceedngs of the World Congress on Engneerng 2012 Vol I Decson Tree Model for Count Data Yap Bee Wah, Norashkn Nasaruddn, Wong Shaw Voon and Mohamad Alas Lazm Abstract The Posson Regresson and Negatve
More informationCharacterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University
Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence
More informationStatistical algorithms in Review Manager 5
Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a metaanalyss of k studes
More informationProceedings of the Annual Meeting of the American Statistical Association, August 59, 2001
Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 59, 2001 LISTASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationSelecting Best Employee of the Year Using Analytical Hierarchy Process
J. Basc. Appl. Sc. Res., 5(11)7276, 2015 2015, TextRoad Publcaton ISSN 20904304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy
More informationMining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
More information1 Example 1: Axisaligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationThe Journal of Systems and Software
The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton
More informationRisk Model of LongTerm Production Scheduling in Open Pit Gold Mining
Rsk Model of LongTerm Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationRequIn, a tool for fast web traffic inference
RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@ntevry.fr, JeanEtenne.Kba@ntevry.fr Abstract As networked
More information