Predicting Software Development Project Outcomes *

Size: px
Start display at page:

Download "Predicting Software Development Project Outcomes *"

Transcription

1 Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA {rosna.weber,mwaller}@drexel.edu; {june.verner,wllam.evanco}@cs.drexel.edu Abstract. Case-based reasonng s a flexble methodology to manage software development related tasks. However, when the reasoner s task s predcton, there are a number of dfferent CBR technques that could be chosen to address the characterstcs of a dataset. We examne several of these technques to assess ther accuracy n predctng software development project outcomes (.e., whether the project s a success or falure) and dentfy crtcal success factors wthn our data. We collected the data from software developers who answered a questonnare targetng a software development project they had recently worked on. The questonnare addresses both techncal and manageral features of software development projects. The results of these evaluatons are compared wth results from logstc regresson analyss, whch serves as a comparatve baselne. The research n ths paper can gude desgn decsons n future CBR mplementatons to predct the outcome of projects descrbed wth manageral factors. 1 Introducton and Background Software development project falure can be very costly. Rsk factors that can determne project falure tend to become evdent only n the later stages of software development lfe cycle; often too late to steer a project safely back on course. As a result, software project managers can be aded greatly by tools that can dentfy lkely success or falure at an early stage, as well as those factors that may contrbute to development problems. Our goal s to develop a tool that can make a good predcton of a project s outcome early n the development lfe cycle and ndcate success or rsk factors. Ths tool has to be flexble enough to accommodate manageral features and be able to manage a varety of knowledge tasks (e.g., capture, reuse) comprsng a knowledge management (KM) system. We chose case-based reasonng (CBR) for ths tool because CBR s a methodology [1] that can support the flexble automaton of the entre process. Addtonally, CBR s approprate for software development related tasks because the methodology resembles human judgment [2]. CBR also offers advantages to support KM efforts [4, 5, 6, 7]. CBR s characterzed as a lazy learner that predcts the outcome of new * D. Brdge and K. Ashley (eds.) Case-Based Reasonng Research and Development. LNAI 2689, , Berln Hedelberg:Sprnger-Verlag.

2 cases by usng a k-nearest neghbor classfer, thus presentng some potental benefts to predct project success (e.g., a relatvely small tranng cost and effort conjugated wth an explanaton for the classfcaton [3]). There are a number of dfferent technques that can be used to mplement CBR; the current vew s that certan desgn choces can bas the system s qualty [8, 9]. Therefore, we need to test dfferent technques and select the one that performs best wth the problem data. Our data was collected from 122 software developers who responded to a questonnare about a software project they had recently worked on. The questons addressed both techncal and manageral features of the chosen software development projects. Past work usng CBR for predcton of software development projects has focused on the techncal, quantfable aspects of software development, predctng, for example, development effort [2, 8, 9] rather than addressng qualtatve manageral factors. In [9], Watson et al. compared three CBR technques and found that the weghted Eucldean dstance was the most accurate method for predctng development effort for web hypermeda software. In [2] Fnne et al. compared CBR technques wth lnear regresson analyss. They predcted effort represented by a contnuous dependent varable, whch s better suted for lnear regresson models. Kadoda et al. [8] also predcted software development effort wth CBR and compared ts performance wth stepwse regresson. They concluded that there s strong assocaton between features of the dataset (e.g., tranng set sze, nature of the cost functon) and the success of a technque, and that the best technque can be determned on the bass of each dataset [8]. The use of CBR to help manage software engneerng projects s commonly assocated wth the experence factory (EF) [10, 11] - a framework that structures the reuse of experence and products obtaned durng the software development lfe cycle. For example, Althoff et al. [7] have adopted the EF model to create an experence base to reuse experences and products obtaned n the development of CBR systems. Our approach dffers because our core goal s to predct success and provde advce to software project managers. Our strategy s not ntended to explctly record methods employed n one development and enable detaled reuse, but smply to gather general experence n software development projects n order to dentfy success factors. In our approach, t s the user s responsblty to search for mtgants when a gven factor seems to suggest potental falure. Our approach s more superfcal and desgned for easer mplementaton and acquston. When usng CBR to manage or predct success of manageral projects, sometmes the number of features descrbng these projects may outnumber the number of cases. The way Can et al. [11] dealt wth ths problem was by ncorporatng doman knowledge to choose the features to explan success or falure. They combned the noton of explanaton-based learnng (EBL), whch uses doman knowledge to generalze a concept from a tranng example [13]. Ther parameterzed combnaton of CBR plus EBL s detaled n Subsecton 3.3. We tested dfferent technques to assess how well they performed wth our data. After testng unweghted measures we used a hll-clmbng feedback method to learn weghts and prevent mperfect features from partcpatng equally n predctng

3 project outcome. We then ntroduced and nvestgated a weghted verson of the CBR plus EBL approach. As we dentfy the best smlarty measure to provde accurate outcome predctons, we wll ntroduce a method to use ths technque to dentfy success factors wth CBR. Because falure factors may have multple mtgants, managers wll have the nformaton necessary to avod project ptfalls. We also used a non-cbr technque, logstc regresson (LR) [14], to predct the outcomes for the same dataset. LR s generally consdered the most accurate statstcal technque for modelng dchotomous dependent varables, partcularly n datasets wth fewer than 100 examples, but t s also a method that tends to be expensve [14]. Cleary et al., explan that LR s a hghly desrable statstcal model and should be used for model fttng and hypothess testng [14]. Hence, we use LR as a pont of comparson for testng varous CBR technques. Ths paper s organzed as follows. Secton 2 dscusses the questonnare used for data collecton. Secton 3 descrbes the predcton technques we used. In Secton 4 we evaluate the technques used and descrbe our results. Secton 5 extends the use of the most accurate CBR technque to dentfy software project success factors. In Concludng Remarks, we present our conclusons and future work. 2 Collectng Software Development Project Data The collecton method s crucal n obtanng relable data. Once data s collected and understood by a KM system t can be managed to systematcally beneft other projects. Verner [15, 16] s a software engneer who collected the data we used n ths research to nvestgate software project crtcal success factors. The data was collected for the sole purpose of analyzng software development projects and dentfyng the factors that determne project success or falure. Table 1 Examples of questons n the questonnare ID q2_1 q2_4 q3_3 q3_5 q3_7 q4_2 q4_8 q4_11 Queston What was the level of nvolvement of the customers/users? Were the customers/users nvolved n makng schedule estmates? Was the scope of the project well defned? Dd the customers/users make adequate tme avalable for requrements gatherng? Dd the requrements result n well-defned software delverables? Was the delvery date decson made wth approprate requrements nformaton? Dd the project have adequate staff to meet the schedule? Dd the schedule take nto account staff leave, tranng, etc.? Removng all varables that descrbe features unknown early n a project resulted n a total of 23 varables. The questonnare ncorporates both objectve and subjectve human judgments about these projects. Table 1 provdes some example questons,

4 whch wll be further dscussed n ths paper, referenced by ther ID. The format of these answers are yes/no and multple-choce. Ths questonnare addresses the areas of management support, customer/user nteracton, requrements, estmaton and schedulng, n relaton to project outcome. The outcome secton orgnally ncluded questons that allowed for conflctng answers. The questonnare asked for project success or falure from the perspectve of the organzaton and from the perspectve of the developer answerng the queston. We merged these varables nto a sngle varable and removed all projects where outcome varables were n conflct. After elmnatng the conflctng answers, the ntal group of 122 project records was reduced to descrbng successful projects and 21 descrbng faled projects. 3 Technques to Predct Project Outcomes Our goal s to desgn a system to systematcally acqure software development project data and understand t wthn a KM framework. A case-based reasoner predcts the outcome of new projects n ths system. In ths secton, we dscuss the CBR system we mplemented to evaluate dfferent smlarty measures to predct project s outcome, descrbe CBR technques and LR. 3.1 CBR Implementaton Our CBR mplementaton uses the data gathered wth the questonnare dscussed n Secton 2, from whch we created a case base wth 88 cases descrbed by 23 features and a bnary varable for project outcome. Ths mplementaton entals the use of four subtasks of the retreve CBR process, namely, dentfy features, ntal match, search, and select [17]. These are standard mplementatons except for the select subtask. In order to accommodate stuatons n whch multple cases have the same smlarty but dfferent outcomes, we broke all tes by selectng the class of the case wth the next hghest smlarty. We treat all features as symbolc; f they have the same value the smlarty equals one, otherwse t equals zero; there are no ntermedary degrees for smlarty. Currently, our mplementaton does not dentfy factors that contrbute to success or falure. The results dscussed later n ths paper wll lay the foundaton for the development of a method to provde dentfcaton of these factors along wth an outcome predcton. Suggestng countermeasures based upon these factors s an addtonal step to ncorporate.

5 3.2 Unweghted k-nn We frst mplemented CBR wth a tradtonal unweghted k-nn classfer to serve as a baselne to compare wth predctons developed wth other technques. Ths measure smply consders the number of smlar features between a canddate case and a target case. 3.3 Explanaton-Based Learnng Explanaton-based learnng (EBL) uses doman-specfc knowledge to generalze a concept [13]. Can et al. [11] used the EBL noton to explore feature relevance -n a combned approach wth CBR - to classfy a dchotomous dependent varable because the features n ther dataset outnumbered the cases. We nvestgate ther CBR+EBL approach because we are desgnng a tool wth an evolvng collecton process, and thus havng more features than cases s a future possblty. The work descrbed n [11] has successfully classfed foregn trade negotaton cases (50 cases, 76 features) by applyng the CBR+EBL approach. The approach entals a parameterzed smlarty measure that ncorporates both elements of tradtonal CBR (explorng the smlarty of features between projects) and elements of EBL (explorng features relevance supported by doman knowledge). The parameterzed Equaton (1) of CBR+EBL ntroduced n [11] s gven by smlarty measure S: S = n = 1 α * sm ( case, cue ) + β n + β = 1 n = 1 n relevance ( case relevance ( case ) ) * sm ( case, cue ). (1) To mplement the CBR+EBL approach, we need to acqure doman-specfc knowledge and choose approprate values for α and β. Knowledge acquston for EBL. Ths knowledge elctaton process s amed at determnng a relevance factor 1 or 0 for each possble queston answered n the questonnare. These factors represent, respectvely, whether or not the answers nfluence the project s outcome. Thus, a factor 1 s gven when the answer for a queston s such that t supports the outcome of that specfc project. For example, software engneerng knowledge mandates that, to be successful, projects should have a schedule (consequently the lack of a schedule could explan falure). Therefore, there are two possble answers for the queston askng whether the project had a schedule: yes or no. The fnal factor 1 or 0 can only be obtaned when we also consder the gven project s outcome. Thus, n a project wth outcome of success, f the answer to the queston about the exstence of the schedule s also yes, then the relevance factor s 1; because, based on doman knowledge, we can state that ths answer supports the outcome of success. The relevance factors vary as shown n Table 2.

6 We ntervewed two software engneers for consstency and represented the knowledge wth smple rules. For example, Fgure 1 shows a multple-choce queston from the customer/user secton of the questonnare, and ts assocated rule obtaned through knowledge acquston from experts. Table 2. Relevance factors for queston about exstence of schedule Does the project have a schedule? Project outcome s Relevance Factor s Yes Success 1 Yes Falure 0 No Success 0 No Falure 1 By mplementng rules for all features where an assocaton occurs based on domanspecfc knowledge, we were able to determne the relevance factor of a partcular feature for a case. Some features can be left wthout rules because no assocaton was evdent. For example, the Yes/No queston q1_7, Dd senor management mpact the project n any other way? does not drectly assess the type of mpact and thus was the only feature n our dataset left wthout a rule. Ths queston exemplfes how we use EBL to help evolve the collecton method; by confrmng the mportance of a gven queston through ts mpact on project outcomes (queston q1_7 wll be removed or reworded n the next verson of the questonnare). Queston: 2.1 What was the level of nvolvement of the customers/users? 1. none 2. lttle 3.some 4. reasonable level 5. hgh nvolvement Rule: IF ((outcome s success) AND (answer s ether (4) OR (5))) OR IF ((outcome s falure) AND (answer s ether (1) OR (2))) THEN relevance factor equals 1, otherwse relevance factor equals 0. Fg. 1. Example queston and ts assocated rule Experts were able to provde assocatons for nearly all questons because the entre questonnare was conceved and desgned wth the sole purpose of analyzng software development projects. Consderable research has been publshed n ths area [e.g., 16, 18]. An aspect that we dd not mplement concerns combnatons of features. It s possble, and very lkely, that experts usng doman knowledge could fnd assocatons between two or more features. One assocaton could explan a gven outcome, whle another may neutralze the effect of ndvdual features. We dd not extend our nvestgaton to address ths, although we wll consder ths aspect n future research.

7 Determnng α and β. Equaton (1) expresses smlarty S as the weghted dstance between two cases that ncorporates β as a weght measurement for the relevance factor. In the EBL component, features are consdered relevant when ther values support the outcome of each case, and thus they are assgned a relevance factor. Therefore, dfferent cases have dfferent sets of relevant features. The CBR component of the equaton, represented by α, explores the smlarty of features between cases. The fnal step n applyng the parameterzed equaton from [11] s to defne values for α and β. The authors [11] who conceved the equaton do not explctly recommend values for α and β; they used α = 1 and β = 15. They stated that the equaton s not very senstve to β and that β values greater than one wll produce smlar results. The baselne for these parameters occurs when α = 0, n whch case only the EBL component s evaluated and dfferent values for β do not mpact the accuracy. When β =0, however, the equaton evaluates smple feature countng (unweghted k-nn). We also evaluated the senstvty of varatons of these two parameters wth respect to our data. Intally we set α = 1, and then 5, 10 and 20 and vared β n search of the greatest accuracy. We found the maxmum accuracy to occur n multple pars of ponts for α and β. To account for the senstvty of these parameters n dfferent datasets, to ensure consstency of the results, and gven that the authors n [11] clam that values above one for β do not mpact the results, we chose to use four dfferent pars for α and β: (1,1), (5,7), (10, 11.5), and (20, 32). Though we perform all the tests usng all the pars, we wll present only the results obtaned wth the frst par (1,1). Gven the bas mposed by applyng the same effect to all features (some may be rrelevant) [19] on the case-based component of the formula, we next nvestgate varable feature weghts. Our goal s to extend the CBR+EBL approach (Equaton (1)) by addng a representaton of relatve feature relevance to that awarded by doman knowledge. Hence, once we have the feature weghts for all the varables, we mplement the combned verson of the parameterzed equaton: S = n = 1 w * sm ( case, cue n w ) + β + β = 1 n = 1 = 1 n relevance relevance ( case ( case ) * sm ( case ), cue ) ; (2) where we use ndvdual weghts nstead of α. To ensure the consstency of the results, we compute S (Equaton (2)) for β = 1 and = 15. The results are dscussed n Secton Feature Weght Learnng The framework for feature weghtng methods descrbed n [19] suggests the use of ncremental hll-clmbng methods when the dataset contans nteractng features. We

8 selected gradent descent (GD), whch s a hll-clmber that uses feedback from the smlarty measure when examnng each case. GD can be mplemented and modfed by adjustng ts geometrc parameters. We used startng step sze 0.5, endng step 0.02, step sze update 0.9, and the number of cases tested was 10. These parameters resulted most effectve n our prelmnary tests. Havng obtaned weghts to account for the relatve mportance of each feature, we used these weghts to predct project outcomes usng the weghted k-nn and weghted CBR+EBL. 3.5 Logstc Regresson LR s generally consdered the most accurate and theoretcally approprate statstcal technque for modelng dchotomous dependent varables, partcularly n datasets wth fewer than 100 examples, though t s also a method that tends to be expensve [14]. Cleary et al., explan that LR s a hghly desrable statstcal model and should be used for model fttng and hypothess testng [14]. It s approprate for our data because outcome s a dchotomous varable. LR produces a formula that predcts the probablty of an occurrence as a functon of the ndependent varables. LR overcomes the problem wth lnear models producng values for the probablty outsde the range of (0,1) desred for a dchotomous dependent varable [19]. Unlke ordnary least squares (OLS) regresson, LR does not assume lnearty of relatonshp between the ndependent varables and the dependent, does not requre normally dstrbuted varables, and n general has less strngent requrements wth respect to the data. 4 Evaluaton Frst we want our evaluaton to determne whch of the CBR technques performs best across three accuracy metrcs when predctng project outcomes wth our dataset. Second we want to compare the performance of the CBR technques to LR for each metrc. Thrd, we want to determne whether the weghted verson of CBR+EBL (S ) s more accurate than ts unweghted verson. 4.1 Methodology We represent the performance of these technques by usng three metrcs. Accuracy represents the number of correct predctons n relaton to the total predctons. True postves represent the number of correct predctons of projects wth outcomes of success. True negatves gve the number of correct predctons of projects wth outcomes of falure. In ths paper, values of accuracy, true postves and true negatves are expressed as percentages.

9 The orgn of the data s explaned n Secton 2. We used the data generated by the questonnare and also prepared for LR. It has 23 symbolc features descrbng 88 tranng examples; 67 successes and 21 falures. For the evaluaton we used stratfed samplng by randomly choosng sx pars of tranng sets (test sets were the complements), wth 44 cases each, mantanng the overall proporton of postve and negatve examples across all sets. Tranng sets 1, 3 and 5 have 33 postve and 11 negatve examples; tranng sets 2, 4 and 6 have 34 postve and 10 negatve examples. We wll present our results n terms of the average and standard devaton across these sx test sets. 1 LR and the weghted forms of CBR requre the use of tranng sets. Tranng sets were used to ether generate equatons for LR or learn feature weghts. Once completed, the tranng parameters were then tested on the testng sets,.e. on data not ncluded n tranng. 4.2 Results A summary of our results s presented n Table 3. These results were obtaned by applyng unweghted k-nn, (unweghted) CBR+EBL, weghted k-nn, weghted CBR+EBL wth β=1, and logstc regresson (LR). The results are gven wth the average and standard devatons across the sx test sets. Table 3. Average and standard devaton for accuracy, true postves, and true negatves Technque Accuracy True Postves True Negatves Ave. St.Dev Ave. St.Dev Ave. St.Dev Unweghted k-nn CBR+EBL Weghted k-nn Weghted CBR+EBL LR On the frst hypothess evaluated, we found CBR+EBL to be the most accurate among the CBR technques for all three metrcs. It performed just as well as the baselne (unweghted k-nn) n predctng successful projects, and was superor n the other two metrcs, though one should note that the standard devaton resulted from the CBR+EBL performance s hgher than the baselne. The results also confrm the conclusons presented n [11] that CBR+EBL outperforms unweghted k-nn. These results support the concluson to adopt CBR+EBL, partcularly because we can expect to have case bases n whch the number of features s greater than the number of cases, whch seems to be no obstacle for CBR+EBL. 1 Though we beleve LOOCV s preferable, applyng LR would requre that we developed 88 sets of equatons, therefore we reled on stratfed samplng.

10 Wth respect to the second hypothess evaluated, CBR+EBL slghtly outperformed LR n the average accuracy, but LR presented a hgher standard devaton. In the average of true postves, LR was superor. LR predcted successful projects well, but was not able to predct faled cases as accurately (less than 30% for true negatves). Ths was lkely due to the sparsty of data among the group of faled software projects (.e., 21 out of 88). Incluson of addtonal faled project cases would very lkely mprove these results. Gven that the data for successful projects s suffcently dense, the metrc true postves n Table 3 emphaszes the loss n accuracy caused by usng a combnaton of feature weghts and the EBL measure (weghted CBR+EBL). Ths s the only technque not able to predct at least 80% of successful projects. Even the unweghted k-nn performs well, easly fndng smlar cases among postve examples. LR s the most accurate n ths metrc. The relatve lack of negatve examples (or faled projects) makes the accuracy of CBR+EBL stand out. Ths s probably the reason why CBR+EBL tends to be more accurate n general. Even wth fewer negatve examples, t provdes better predctons. Our results suggest that there s an advantage n usng CBR especally CBR+EBLwth ths type of data, whch may often be sparse n real world problems. Therefore, we recommend CBR+EBL to predct the (bnary) outcome of software development projects. For the thrd hypothess evaluated, we wanted to compare the performance of the weghted verson of CBR+EBL (S ) wth respect to the unweghted S. S provded the lowest accuracy wth the hghest standard devaton. It actually performed more poorly n predctng successful projects from dense data than t dd predctng faled cases from sparse data. These results suggest nvestgatng further why the weghted verson of CBR+EBL dd not perform well n accuracy and true postves but performed much better (wth respect to other technques) n true negatves. The varaton n performance of LR suggests an assocaton wth the number of negatve examples; hence we want to nvestgate f a smlar assocaton could be made. Possble causes for performance of S. Table 3 shows that the weghted verson of CBR+EBL outperforms all technques that do not use doman knowledge for the metrc true negatves. Ths may suggest that the combnaton of both the CBR and EBL components would be approprate to learn from sparse data (and predct faled projects). When we compare the performance of the unweghted versons alone, CBR+EBL performs (a lttle) better than k-nn for accuracy, exactly the same as k- NN for true postves, and (much) better than k-nn for true negatves. Ths comparson suggests that when we add the EBL component, the performance mproves wth respect to true negatves. Addtonally, f we analyze the weghted versons, the weghted CBR+EBL performs worse than the weghted k-nn. It performs just as poorly for true postves and better only for true negatves. These facts, combned wth the hgh standard devatons, found for the weghted CBR+EBL measure, nstgate further examnaton. Gven ths prelmnary analyss, our

11 hypothess s that when we combne feature weghts wth the EBL component, t overestmates the relatve mportance of some features. Ths s detrmental to predctve accuracy wth dense data, but when appled to sparse data, the method seems to work farly well. In future work, we wll perform a second experment to evaluate ths hypothess; n ths paper, we smply examne further our results. In order to fully gauge the effect of combnng feature weghts wth EBL, we examne the weghts and the EBL component n dfferent test sets. The weghts were determned by gradent descent (see Secton 3.4), whle the EBL component s provded by the assgnment of relevance factors (see Secton 3.3). Table 4 ranks the sx varables that receved the largest number of relevance factors overall. The fnal row shows the performance of the weghted CBR+EBL for true postves for these sets. The two columns n Table 4 dedcated to the two test sets show each varable s overall rankng wthn each test set based upon ther feature weghts (the hgher the number, the smaller the weght). For example, varable q3_7 2 was fourth among the varables overall n terms of the number of relevance factors assgned, ranked fourth n test set 2 and 16 th n test set 6. Table 4. Relevance factors and weghts n test sets 2 and 6 Rank Varable ID Test Set 2 Test Set 6 1 q3_ q3_ q4_ q3_ q4_ q3_ True Postves The analyss of Table 4 shows that fve of the sx varables were heavly weghted n test set 2 (rankngs 1, 3, 4, 5, 7) and lghtly weghted n test set 6 (rankngs 10, 16, 17, 19). When comparng these rankngs to the performance for true postves, t s clear that where the majorty of these varables were heavly weghted, weghted CBR+EBL performed the most poorly and where these varables were most lghtly weghted, CBR+EBL performed the most accurately. Ths supports our nterpretaton that predcton accuracy decreases when the same features receve hgh weghts and are assgned relevance factors n the majorty of the cases (overestmatng the relatve mportance of some features). We wll evaluate ths hypothess n future work, because f the combnaton of the two technques ncreases predcton accuracy when the dataset s sparsely populated, ths measure can be used n these cases. 2 See meanng of feature n Table 1.

12 5 Assocatng Outcomes to Project Management Ths paper s fnal challenge s to make use of the most accurate technque to perform an addtonal task. We would lke to determne whch factors have the strongest assocatons wth partcular project outcomes. These could then be hghlghted as crtcal rsk factors n projects headng for falure, allowng a project manager to dentfy key strengths and weaknesses early enough to establsh correctve measures when needed. Gven the sutablty of LR, we agan use t as a benchmark. The LR process ncludes the dentfcaton of the varables that most strongly predct the dependent varable; thus, predctor varables are a byproduct of LR. Based on LR, the varables that have the most nfluence on project outcome are q3_3, q3_5, q4_2, and q4_8. We use the parameterzed equaton S (CBR+EBL) to suggest predctor varables by predctng the outcome of each of the four questonnare sectons separately. We compare these predcton results wth the predcton generated usng the entre dataset. Isolated problem areas and features that predct outcome nearly as well as the entre dataset are assumed to be those most responsble for project outcome. Table 5. Average accuracy (n %) for the four sectons across the sx test sets Problem Area Ave. Std dev. Management Support Customer/User Requrements Estmaton/Schedule Table 5 shows the average accuracy n each of the four sectons for the sx test sets. Gven the smlarty of average accuracy of the three problem areas customer/user, requrements, and estmaton-schedule, our strategy s to further examne these three areas for potentally useful predctors. We wll exclude the management support secton because of ts lower accuracy when compared to the other sectons. In order to further nvestgate the three problem areas, we assess the frequences of the relevance factors n each test set. Our assumpton s that features that have scored a hgher number of relevance factors are those most responsble for the project outcome. We note that these tests were performed on test sets wth 44 cases, so that a feature that has, for example, been assgned a relevance factor 29 tmes, has nfluenced 65.9% of the cases. Table 6 summarzes the averages across the sx test sets for the four most relevant varables n each of the three sectons. These varables are the ones wth the best potental to be predctors. Among these varables, q3_3 and q3_5 n the requrements secton, and q4_8 and q4_2 n estmaton-schedule are also the varables dentfed by LR. Table 6. Average assgnments of relevance factors Customer-user Requrements Estmaton and Schedule Varable ID Ave Varable ID Ave Varable ID Ave

13 q2_ q3_ q4_ q2_ q3_ q4_1lst 22.5 q2_ q3_ q4_ q2_ q3_ q4_ We now examne the varable wth the hghest relevance factor frequences (q3_5) to determne whether t has been valued because t s relevant to successful or faled projects. For ths last analyss, we do not want to use averages, so we select test sets fve and sx because they are the sets that present relevance frequences that are the most smlar to the averages for the requrements secton. In test set 5, feature q3_5 supported successful outcomes n 87.9% of the cases, that s, 29 out of 33. It supported falure outcomes n 45.5% of the cases, 5 out of 11. In test set 6, q3_5 supported success n only 26 of 34 cases, whch represented 76.5% of the cases. The number of faled projects supported by q3_5 n ths set was 7, representng 70% of the cases (7 out of 10). Lookng exclusvely at these two test sets, q3_5 supported success n over 80% of the cases; and supported falure n almost 60% of the cases. It would be premature to state whether ths feature can be consdered as a crtcal success factor n successful or faled projects. Further study s necessary to dentfy a method to valdate such condtons for a varable. Because ths varable deals wth the tme dedcated for requrements gatherng by the customer/users, t s easy to accept t conceptually as a crtcal factor for both success and falure n ths nstance. These tests are not conclusve but show promsng results for further research nto automatcally determnng success and rsk factors. Further research wll delve more deeply nto methods for dentfyng crtcal factors for success and falure; ths wll help project managers to better understand falure and how to ncrease chances of success by takng acton early n software project development. 6 Concludng Remarks Ths paper extends our understandng of the amenablty of usng CBR to support KM tasks for managng knowledge from projects descrbed by manageral factors. We conclude that the parameterzed CBR+EBL smlarty measure [11] s the most accurate technque for ths applcaton. The second most accurate CBR technque, weghted k-nn, performed nearly as accurately as CBR+EBL overall, but t dd not perform as well for predctng project falures because t appears to be less tolerant of sparse data sets. When the dataset was suffcently dense, CBR+EBL performed at or above the level of all other methods and t stood out when the data was sparse. For ths reason, the CBR+EBL measure s the best choce for ths type of predcton. Wth respect to the CBR technques we used, one ntrgung result was that the weghted k-nn performed smlarly to CBR+EBL (S) n some nstances but combnng the two methods (S ) generally decreased predcton accuracy. We beleve that ths s because the two methods provde smlar relatve levels of mportance to

14 the same data and a combnaton of these methods ultmately overstates the relatve strength of these varables. In general, ths may lead to less accurate predcton results. As dscussed n Subsecton 4.2, however, when we analyzed predcton for sparse data sets, t s possble that the combnaton of these methods wll allow for a strong assocaton between key factors and outcomes. It s also mportant to note, however, that even wth sparse data, the combnaton of these methods dd not outperform the CBR+EBL method alone. Probng ths queston more deeply wll be a focus of future research. The accuracy provded by logstc regresson suggests ts use as a benchmark for the predcton accuracy of a case base, when the dependent varable s dchotomous. These technques cannot be consdered as compettve alternatves for supportng a KM framework because LR poses hgher engneerng requrements, as t requres statstcal expertse and a complex process of analyss. These tasks may be more easly performed usng a CBR technque, whch has less strngent engneerng requrements. The prmary cost beneft of CBR s manfested through automaton knowledge acquston and system reuse may be fully automated so that staff members may use the tool wthout needng n-depth knowledge about the technques used by the tool. We have shown how to detect potental predctor varables for project success. Our work dentfes q3_3, a well defned project scope; q3_5, customers/users makng adequate tme for requrements gatherng; q4_2, a delvery decson (schedule) made wth adequate requrements nformaton; and q4_8, assgnment of adequate staff to meet the project schedule, as the success factors n our dataset. It s noteworthy that CBR has found q3_5 to be the most mportant factor for both project success and falure. As project managers well know, estmaton of a reasonable schedule s mpossble wthout good requrements; good requrements and a well-defned project scope go hand-n-hand; and good requrements are essental for assgnng adequate staff to a project schedule. There are a few areas of future work that we wsh to explore: evaluate the causes of the performance of S, valdate predctor varable dentfcaton, nvestgate the use of predctor varables to derve mtgants for project management, and analyze combnatons of features to assgn relevance factors n EBL. In the CBR mplementaton, we wll ncorporate the step that ndcates the success or falure factors. Acknowledgements. Rosna Weber s research s supported n part by the Natonal Insttute for Systems Test and Productvty at USF under the USA Space and Naval Warfare Systems Command grant no. N C-3244, for L0, References 1. Watson, I.: CBR s a methodology not a technology. Knowledge Based Systems, 12 (5-6) (1999)

15 2. Fnne, G.R., Wttg, G.E., Desharnas, J.M.: A Comparson of Software Effort Estmaton Technques: Usng Functon Ponts wth Neural Networks, Case-Based Reasonng and Regresson Models. Journal of Systems Software 39 (1997) Aha, D. W.: Feature weghtng for lazy learnng algorthms. In H. Lu & H. Motoda (eds.): Feature Extracton, Constructon and Selecton: A Data Mnng Perspectve. Norwell, MA, Kluwer (1998) Aha, D.W., Becerra-Fernandez, I., Maurer, F., & Muñoz-Avla, H.: (eds.) Explorng Synerges of Knowledge Management and Case-Based Reasonng: Papers from the AAAI Workshop (Techncal Report WS-99-10). Menlo Park, CA, AAAI Press (1999) 5. Watson, I.: Knowledge Management and Case-Based Reasonng: a perfect match? In Proc. of the Fourteenth Annual Conference of the Internatonal Florda Artfcal Intellgence Research Socety Menlo Park, CA, AAAI Press (2001) Weber, R., Aha, D.W., Becerra-Fernandez, I.: Intellgent lessons learned systems. Internatonal Journal of Expert Systems Research & Applcatons 20 1 (2001) Althoff, K.D., Nck, M., Tautz, C.: CBR-PEB: An Applcaton Implementng Reuse Concepts of the Experence Factory for the Transfer of CBR System Know-How. In Proc. of 7th German Workshop on Case-Based Reasonng, Würzburg (1999) Kadoda, G, Cartwrght, M., Shepperd, M.: Issues on the Effectve use of CBR Technology for Software Project Predcton. In Aha, D., Watson, I., (eds.): Case-Based Reasonng Research and Development, LNAI, 2080, Sprnger (2001) Watson, I., Mendes, E., Mosley, N., Counsell, S.: Usng CBR to Estmate Development Effort for Web Hypermeda Applcatons. In Proc. of the Ffteenth Annual Conference of the Internatonal Florda Artfcal Intellgence Research Socety. Menlo Park, CA, AAAI Press (2002) Basl, V.R., Caldera, G., Rombach, H.D.: Experence Factory. In J. J. Marcnak, (ed), Encyclopeda of Software Engneerng 1 (1994) Jedltschka, A., Althoff, K.-D., Decker, B., Hartkopf, S., Nck, M.: Corporate Informaton Network (COIN): The Fraunhofer IESE Experence Factory, IESE-Report No /E. Verson 1.0, May (2001) 12. Can, T., Pazzan, M. J., Slversten, G. Usng doman knowledge to nfluence smlarty judgment. In Proc. of the Case-Based Reasonng Workshop. Washngton, DC, Morgan Kaufmann (1991) Mtchell, T., Keller, R., Kedar-Cabell, S.: Explanaton-based generalzaton: A Unfyng Vew. Machne learnng 2 (1986) Cleary, P.D., Angel, R.: The Analyss of Relatonshps Involvng Dchotomous Dependent Varables, Journal of Health and Socal Behavor 25 (1984) Verner, J. M, Overmyer, S. P. and McCan, K. W.: In the 25 Years Snce the Mythcal Man-Month What Have we Learned About Project Management? Informaton and Software Technology 41 (1999) Procaccno, J.D., Verner, J.M., Overmyer, S. P., Darter, M.: Case Study: Factors for Early Predcton of Software Development Success, Informaton and Software Technology 44 (2001) Aamodt, A., Plaza, E.: Case-Based Reasonng: Foundatonal Issues, Methodologcal Varatons, and System Approaches. Artfcal Intellgence Communcatons 7(1) (1994) Reel, J.S.: Crtcal Success Factors n Software Projects, IEEE Software 16(3) (1999) Wettschereck, D., Aha, D.W.: Weghtng features. Veloso, M., Aamodt, A. (eds). Case- Based Reasonng Research and Development, LNAI 1010, Sprnger-Verlag (1995)

16 20. Whtehead, J.: Wllngness to Pay for Bass Fshng Trps n the Carolnas. In An Introducton to Logstc Regresson, Wrtng up results. (1998) Avalable onlne (last vsted 03/31/2003):

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology [email protected] Brook Wu New Jersey Insttute of Technology [email protected] ABSTRACT Ths

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc. Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance

More information

Selecting Best Employee of the Year Using Analytical Hierarchy Process

Selecting Best Employee of the Year Using Analytical Hierarchy Process J. Basc. Appl. Sc. Res., 5(11)72-76, 2015 2015, TextRoad Publcaton ISSN 2090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA ) February 17, 2011 Andrew J. Hatnay [email protected] Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Demographic and Health Surveys Methodology

Demographic and Health Surveys Methodology samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently. Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

A 'Virtual Population' Approach To Small Area Estimation

A 'Virtual Population' Approach To Small Area Estimation A 'Vrtual Populaton' Approach To Small Area Estmaton Mchael P. Battagla 1, Martn R. Frankel 2, Machell Town 3 and Lna S. Balluz 3 1 Abt Assocates Inc., Cambrdge MA 02138 2 Baruch College, CUNY, New York

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Fuzzy TOPSIS Method in the Selection of Investment Boards by Incorporating Operational Risks

Fuzzy TOPSIS Method in the Selection of Investment Boards by Incorporating Operational Risks , July 6-8, 2011, London, U.K. Fuzzy TOPSIS Method n the Selecton of Investment Boards by Incorporatng Operatonal Rsks Elssa Nada Mad, and Abu Osman Md Tap Abstract Mult Crtera Decson Makng (MCDM) nvolves

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Estimating the Development Effort of Web Projects in Chile

Estimating the Development Effort of Web Projects in Chile Estmatng the Development Effort of Web Projects n Chle Sergo F. Ochoa Computer Scences Department Unversty of Chle (56 2) 678-4364 [email protected] M. Cecla Bastarrca Computer Scences Department Unversty

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses Student Performance n Onlne Quzzes as a Functon of Tme n Undergraduate Fnancal Management Courses Olver Schnusenberg The Unversty of North Florda ABSTRACT An nterestng research queston n lght of recent

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

Instructions for Analyzing Data from CAHPS Surveys:

Instructions for Analyzing Data from CAHPS Surveys: Instructons for Analyzng Data from CAHPS Surveys: Usng the CAHPS Analyss Program Verson 4.1 Purpose of ths Document...1 The CAHPS Analyss Program...1 Computng Requrements...1 Pre-Analyss Decsons...2 What

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,

More information

ERP Software Selection Using The Rough Set And TPOSIS Methods

ERP Software Selection Using The Rough Set And TPOSIS Methods ERP Software Selecton Usng The Rough Set And TPOSIS Methods Under Fuzzy Envronment Informaton Management Department, Hunan Unversty of Fnance and Economcs, No. 139, Fengln 2nd Road, Changsha, 410205, Chna

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

7.5. Present Value of an Annuity. Investigate

7.5. Present Value of an Annuity. Investigate 7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

More information

HARVARD John M. Olin Center for Law, Economics, and Business

HARVARD John M. Olin Center for Law, Economics, and Business HARVARD John M. Oln Center for Law, Economcs, and Busness ISSN 1045-6333 ASYMMETRIC INFORMATION AND LEARNING IN THE AUTOMOBILE INSURANCE MARKET Alma Cohen Dscusson Paper No. 371 6/2002 Harvard Law School

More information