Comparing and Combining Evolutionary Couplings from Interactions and Commits

Size: px
Start display at page:

Download "Comparing and Combining Evolutionary Couplings from Interactions and Commits"

Transcription

1 Comparing and Combining Evolutionary Couplings from Interactions and Commits Fasil Bantelay, Motahareh Bahrami Zanjani Department of Electrical Eng. and Computer Science Wichita State University Wichita, Kansas 67260, USA, {ftbantelay, Huzefa Kagdi Department of Electrical Eng. and Computer Science Wichita State University Wichita, Kansas 67260, USA Abstract The paper presents an approach to mine evolutionary couplings from a combination of interaction (e.g., Mylyn) and commit (e.g., CVS) histories. These evolutionary couplings are expressed at the file and method levels of granularity, and are applied to support the tasks of commit and interaction predictions. Although the topic of mining evolutionary couplings has been investigated previously, the empirical comparison and combination of the two types from interaction and commit histories have not been attempted. An empirical study on 3272 interactions and 5093 commits from Mylyn, an open source task management tool, was conducted. These interactions and commits were divided into training and testing sets to evaluate the combined, and individual, models. Precision and recall metrics were used to measure the performance of these models. The results show that combined models offer statistically significant increases in recall over the individual models for change predictions. At the file level, the combined models achieved a maximum recall improvement of 13% for commit prediction with a 2% maximum precision drop. Index Terms Mining Software Repositories; Evolutionary Couplings; Mylyn; Interaction History; Commit History I. INTRODUCTION Change Impact Analysis (IA) or change prediction in source code has been investigated in the software maintenance community. The main goal of this task is to estimate the complete extent of a proposed change in source code (e.g., due to a new feature or bug report). That is should a source code entity be changed, what other entities also need to be changed? Numerous solutions to this task, ranging from traditional static and dynamic techniques to contemporary methods from information retrieval and mining software repositories, have been reported in the literature [1-4]. It shows a definitive progress in supporting this task; however, there remains much work to be done in improving its effectiveness (accuracy). Furthermore, developers may interact, e.g., navigate, view, and modify, software entities within an Integrated Development Environment (IDE) that may not be eventually committed to the code repository. These interactions could have contributed in locating and/or verifying the entities that were changed due to a change request. In this paper, we investigate complementary ways to improve support for change prediction and associated developer interactions. Evolutionary couplings mined from commits in source code repositories have been used to support the task of change prediction [5-9]. Similarly, interactions recorded with task management tools, such as Mylyn, have shown promise in helping developers [10-12]. We compare the efficacy of evolutionary couplings mined from commits and interactions for the change prediction task. On the face value, it could be conjectured that commits (i.e., changed entities) are a subset of interactions (i.e., viewed and changed entities). Consequently, comparing and combining the two should lead to obvious postulates and/or predictable outcomes. Our investigation on the Mylyn dataset found that this subset relationship does not always hold (see Table 1). This fact suggests the potential orthogonality of these two sources and inspires our work. Monitoring developer interactions and recording them, however promising, is a relatively recent phenomenon. Its use is arguably not yet prevalent to the scale of source code repositories, i.e., the number of open source projects with interaction histories are far fewer than those with source code repositories. Having an automatic support for interaction prediction, similar to change/commit prediction, could potentially benefit developers. Our quest is to examine the viability of evolutionary couplings mined from commits in assisting developers with (future) interactions. We also present combination models of commits and interactions to mine evolutionary couplings with the goal of improving effectiveness of commit and interaction predictions. Combining the two different, yet somewhat related, histories could lead to redundancy, and subsequently create a fallacy of strong (otherwise non-existing) couplings. Thus, we explore and assess different ways of combining the two histories in a systematic and synergetic way. These couplings are demonstrated on commit and interaction prediction tasks at source code file and method levels of granularity. To the best of our knowledge, these combined approaches were neither attempted nor empirically assessed previously. We conducted an empirical study on 3272 interaction traces and 5093 commits from Mylyn, an open source task management tool. These interactions and commits were divided into training and testing sets to evaluate the combined, and individual, models. Precision and recall metrics were used to measure the performance of these models. The results show that combined models offer statistically significant increases in recall over the individual models for change predictions. The results also show that a model trained from commit histories can predict interactions with a higher precision than those from interaction histories /13/$31.00 c 2013 IEEE 311 WCRE 2013, Koblenz, Germany

2 Figure 1. A snippet of 4 interaction events (labeled 1-4) recorded by Mylyn for bug issue with trace id In the 1 st interaction, createeditortab method is selected. In the 2 nd, 3 rd & 4 th interactions, contextactivated method is indirectly manipulated, then directly selected and finally edited. A) Method name: createeditortab; B) Class name: ContextEditManager; C) File name: ContextEditorManager.java; D) Parameter types for createeditortab method: Lorg.eclipse.ui.internal.EditorReference & Ljava.lang.String. In summary, our paper makes the following contributions: A combined approach for mining evolutionary couplings from commit and interaction histories. An empirical comparison of the combined evolutionary couplings with the two types of individual couplings for commit and interaction prediction tasks. An empirical comparison of two types of evolutionary couplings mined from commits and interactions for commit and interaction prediction tasks. The remainder of this paper is organized as follows: Section II presents our approach for mining evolutionary couplings from interactions and commits. Section III describes our empirical evaluation and results on the Mylyn dataset. In Sections IV and VI, threats to validity and conclusion are presented. In Section V, we discuss previous work. II. COMBINED APPROACH FOR MINING EVOLUTIONARY COUPLINGS Interaction history has been used to detect both interaction and change couplings [13-15]. Change history from source configuration management systems, such as CVS and SVN, is used to detect change couplings [5, 9, 16-18]. We combine commits from SCM with interactions from the Mylyn model in an attempt to find a better prediction model than using interaction or change history alone. Our rationale for combining the two histories includes the following: A combined model could assist developers by recommending potential entities to be interacted with the training data from commits. A given project may not possess both interaction and change histories. Combined models may detect additional couplings compared to individual models. Elements that are both interacted and committed together could signify stronger couplings than those only interacted or committed. Therefore, a combined model has a potential to uncover such couplings. We use the Mylyn interaction data to mine interaction couplings: programmers who interacted with the method X also interacted with the method Y. We use commits from the SCM to mine change couplings: programmers who changed the method X also changed the method Y. We applied the couplings for supporting interaction and commit prediction tasks. An Interaction Prediction (IP) refers to recommending software entities that may need to be interacted with for a task. A Commit Prediction (CP) refers to recommending software entities that may need to be committed for a task completion. To mine Interaction Couplings (IC), we first extract Mylyn interaction files from the bug tracking system, then process them into transactions, and finally employ the association rule mining technique. Sections II.A, II.B, and II.D elaborate on these steps. To mine Interaction Couplings (IC), we first extract commits from the source code repository, then process them into transactions with additional parsing for the method level granularity, and finally employ the association rule mining technique. Sections II.C and II.D elaborate on these steps. The commonality and orthogonality of interactions and commits is the guiding force in devising the combined models. The linchpin is the bug numbers. Two levels of information are considered: 1) the relationship of interactions and commits to a bug # in the issue tracking system and 2) the commonality and uniqueness of entities involved in interactions and commits. Note that a bug report is used to reference any type of issue (e.g., defect, enhancement, or feature) in the bug tracking system. Sections II.E and II.F elaborate on these steps. A. Interaction Data and Hosting Repositories Interaction is the activity of programmers in an IDE during a development session, e.g., editing a file, or referencing an API documentation, or browsing the web from within an IDE. Different tools, such as Mylyn, have been developed to model programmers actions in IDEs [10, 11, 19-21]. Mylyn monitors programmers activities inside the Eclipse IDE and uses the data to create an Eclipse user interface focused around a task. The Mylyn interaction consists of traces of interaction histories. Each historical record encapsulates a set of interaction events needed to complete a task. Once a task is defined and activated, the Mylyn monitor records all the interaction events, the smallest unit of interaction within an IDE, for the active task. For each interaction, the monitor captures about eight different types of data attributes [12]. The structure handle attribute contains a unique identifier for the target element affected by the interaction. E.g., the identifier of a Java class contains the names of the package, the file to which the class belongs to, and the class. Similarly, the identifier of a Java method contains the names of the package, the file and the class the method belongs to, and the method name, and the parameter 312

3 type(s) of the method. Figure 1 shows an example of 4 consecutive Mylyn interaction events. For each active task, Mylyn creates an XML trace file called mylyn-context.zip that contains all the interaction events A trace file contains the interaction history of a task. This file is typically attached to the project s issue tracking system, such as Bugzilla, Trac or JIRA 1. The trace files for the Mylyn project are archived in the Eclipse bug tracking system as attachments to a bug report 2. A bug issue may contain multiple interaction traces. For example, issue# contains 12 trace attachments. Each trace has a unique attachment id and contains the mylyn/context/zip tag to distinguish it from others. B. Extracting and Processing Interactions into Transactions We first need to identify bug reports that contain mylyncontext.zip attachment(s) because all bug issues may not contain interaction trace(s). To do so, we searched the Eclipse bug-tracking system for bugs containing at least one mylyncontext.zip attachment. Another factor to consider is, all the interactions to a system may not result in committed changes to a source control system. If a bug issue is not fixed with a resolution, it is unlikely for a corresponding commit history to exist. Thus, we only searched for bug issues with Resolved status and a Fixed resolution. We developed a tool to process the search result, which performs the major following tasks: Downloading trace files the tool takes the search result from the Eclipse bug-tracking site as input and automatically downloads all the trace files to a user specified directory. The trace files have the same name, mylyn-context.zip. The tool renames each file by using the bug id and attachment id (separated by an underscore) giving them a unique identifier in the directory they reside in. Internally, the tool identifies the trace file id(s) for each bug issue. If options are specified to output this result, the tool can save the bug ids with the corresponding trace ids in a Java properties file format, the key being the bug id and the values being a comma separated list of trace ids. It uses the URL pattern to download each trace file in the history by replacing X with the trace id. Processing trace files the tool takes the directory that contains the trace files as input and parses each trace file to identify the list of Java files and methods manipulated by each interaction history. We consider each trace file as an interaction transaction. For each transaction, the tool outputs the issue number together with a tab-separated list of Java files and methods. We need the issue number to create a link between interaction and commit transactions. The targeted files and methods are identified from the structure handle of the interaction event. Figure 1(A and C) shows a method name and file name from event 1 and 3 respectively. Two types of patterns are used to identify file and method interaction targets from the structure handle. Pattern <P>/<S><<K>{<F>.java is used to identify file-level targets and pattern <P>/<S><<K>{<F>.java[<C>~<M> is used to identify method-level targets. P is the name of the project. S is the director structure containing the target. K is the package name. F is a Java file name with.java extension. C is a class name. M is a method name Removing noise from transactions After the parsing of trace files is completed, the tool eliminates two types of noises from interaction transactions. Multiple interactions to the same target Mylyn can create different types of interaction events on the same target in a single interaction history [12]. In Figure 1, the contextactivated method is manipulated by three different kinds of interactions. For the purpose of evolutionary couplings, we only need the first interaction to a software entity in a single interaction history. The tool considers only the first interaction to an element and ignores subsequent events on it. This produces a list of unique files (methods). Unintended interactions Mylyn does not have a mechanism to avoid unintended interactions to the system. As a result, some processes, such as automated processes and accidental interactions, may lead to unusually very large interaction transactions. In order to avoid detection of interaction couplings from such interactions, we removed large transactions, i.e., those containing the number of elements larger than the 3 rd quartile, w.r.t the frequency distribution of number of elements in each interaction. C. Extracting and Processing Commits into Transactions Our approach also requires commit data from version archives, such as SVN and CVS. For detecting evolutionary couplings, we need files that have been changed together in a single commit operation. SVN preserves atomicity of commit operations; however, older versions of CVS did not [22]. For a project hosted in an older CVS repository, we convert the CVS repository into an SVN repository using the CVS2SVN tool, which has been used in popular projects such as gcc 3. The tool mines file-level commit transactions from SVN repository. For mining method-level transactions, we used a previously developed tool with some modification to identify the issue number associated with each commit [16]. For each transaction, we extract the bug id from the commit message. Unless Mylyn is configured to generate an automatic commit message, the bug id may not be found in it 4. Unlike interaction transactions, commit transactions may not be always associated with a bug id. Similar to interaction transactions, we discarded large commit transactions, which could be due to a branch or merge operation in CVS. Our tool discards commit transactions containing the number of elements larger than the 3 rd quartile, w.r.t the frequency distribution of the number of elements in each commit. D. Detecting Interaction and Change Couplings Interaction Couplings (IC) intend to identify software entities that were frequently navigated (viewed, changed, or both) together during a single session. Change couplings (CC) intend to identify software entities that were frequently committed together to a source code repository. In Mylyn the unit of a session is a task to fix a defect or implement an enhancement request. For detecting both types of evolutionary couplings, we employed the association rule mining technique with different minimum support values, similar to Ying et al m.doc.isv%2fguide%2fteam.htm 313

4 [9] and Zimmermann et al.[17], specifically the Apriori method [23]. Unlike Zimmermann et al. approach, we mined one-tomany association rules, so that the models can start predicting couplings with a single antecedent. Association rule is a data mining technique for discovering interesting relationships between different items, in this case software entities, from historical transactions. Let P = {E1, E2,, En} be a set of n software entities: fields, methods, classes, or files of a program. Let C = {C1, C2,,Cm} be a set of m change transactions and let I = {I1, I2,, In} be a set of n interaction transactions. Each element in C and I is a set in P. An IC is defined as an association rule:! 𝑋! 𝑋! (1) of two pairs of disjoint program elements X1 and X2 in I. Similarly, CC is defined as an association rule:! π‘Œ! π‘Œ! (2) of two pairs of disjoint program elements Y1 and Y2 in C. X1 and Y1 are called antecedents, X2 and Y2 are called consequents, and 𝑠 is the minimum support. E. Combining Commit and Interaction Transactions To combine the two histories from interactions and commits in a systematic and synergetic way, we used the bug id as a common attribute. As pointed out in Section II.C, some commit transactions may not contain an associated bug id. Thus, it is not possible to combine those commit transactions with their corresponding interaction transactions. The second point is that a bug id may be associated with one or more interaction and commit transactions, which results different kinds of relationships between them. Table 1 highlights the six possible types of relationships that could exist between interaction and commit transactions at a file-level granularity. the Mylyn dataset (see Table 1 with bug ids and ). Additionally, all the entities changed in a commit may not be tracked to the corresponding interaction history. Considering the type of relationships that could exist between interactions and commits, we devised four different ways of combining interaction and commit histories using the bug id. Let I be the interaction dataset containing Ni number of interaction transactions and C be the commit dataset containing Nc number of commit transactions. Both I and C are multisets (mset multiple-membership set) because duplicate transactions may exist [24]. Let f be a function that returns the multiplicity, the number of occurrences, of a transaction in I or C. The cardinality, the sum of all the multiplicities of each element, of the mset is the number of transactions constituting the dataset. Let B be a set containing the list of bug ids. Formally, Interaction History - 1 txn containing 2 files txn containing 1 file Commit History - 1 txn containing 2 files - 2 txns each containing 1 unique file - 3 txns containing 11-1 txn containing 2 files unique files - 4 txns containing 10-3 txns each containing 2 unique files unique files - 1 txn containing 1 file - No txn - No txn - 1 txn containing 4 files txn stands for a transaction, and file is a Java file (3) 𝐢 = {((𝑐, 𝐡𝑐), 𝑓(𝑐)): 𝑐 𝑖𝑠 π‘Ž πΆπ‘‡π‘Žπ‘›π‘‘ 𝐡𝑐 = π‘œπ‘Ÿ 𝐡𝑐 𝐡} (4) IT and CT stand for the considered interaction and commit transactions respectively. In the first combination dataset, denoted by P, we simply concatenate I and C one after the other without any regard to the redundant information. The result of this operation is an mset with the cardinality Ni + Nc. P is defined as the additive union of I and C. 𝑃 =𝐼 𝐢 = 𝑄= 𝑐, 𝐡𝑐, 𝑓(𝑐) 𝑒𝑙𝑠𝑒 ((𝑖 𝑐! 𝐡𝑖),1) 𝑖𝑓 𝐡𝑖 = 𝐡𝑐 π‘Žπ‘›π‘‘ 𝑓(𝑖) = 𝑓(𝑐) = 1 𝑖, 𝐡𝑖, 𝑓 𝑖, 𝑐, 𝐡𝑐, 𝑓 𝑐 𝑒𝑙𝑠𝑒 (5) (6) In the third combination dataset, denoted by R, we also attempted to eliminate redundant elements whenever a relationship exists between an IT and a CT. A relationship between an IT and a CT exists if and only if transactions from I and C are associated with the same bug id. The relation could be 1-to-1, 1-to-*, *-to-1, or *-to-*. Bug ids , , and from Table 1 satisfy this condition. 𝑅= For bug# , the two interacted files were finally committed. For bug# the interacted file was finally committed in one of the corresponding commit transactions. For bug# , the committed files were among the interacted files in all the 3 interaction transactions. For bug# , 2 of the interaction transactions were the same, the 3rd one was a subset of the 1st 2 transactions and the 4th one contains 2 unique Java files. 2 of the files in the 1st 2 interaction transactions were in the 1st commit transaction. The files in one of the commits were not part of any of the 4 interaction transactions. The 2 files in the 4thinteraction transaction were found in the 3rdcommit. Ideally, we would expect an interaction history to exist for each commit transaction or vice versa; however, it is not always the case for 𝑖, 𝐡𝑖, 𝑓 𝑖 + 𝑓(𝑐) 𝑖𝑓 𝐡𝑖 = 𝐡𝑐 π‘Žπ‘›π‘‘ 𝑓 𝑖 = 𝑓 𝑐 𝑖, 𝐡𝑖, 𝑓(𝑖), For the second combination dataset, denoted by Q, we attempted to eliminate redundant elements whenever a one-toone correspondence is detected between an IT and a CT. A one-to-one relationship between an IT and a CT exists if and only if a bug id is associated with a single IT and a single CT. Bug id from Table 1 satisfies this condition. TABLE 1. SIX TYPES OF RELATIONSHIPS BETWEEN INTERACTION AND COMMIT TRANSACTIONS. Bug Id 𝐼 = {((𝑖, 𝐡𝑖), 𝑓(𝑖)): 𝑖 is an 𝐼𝑇 and 𝐡𝑖 𝐡} 𝑖 𝑐! 𝐡𝑖, 1 𝑖𝑓 𝐡𝑖 = 𝐡𝑐 𝑖, 𝐡𝑖, 𝑓 𝑖, 𝑐, 𝐡𝑐, 𝑓 𝑐 𝑒𝑙𝑠𝑒 (7) For the fourth combination dataset, denoted by S, we only consider related ITs and CTs, and exclude unrelated transactions from the datasets. 𝑆= 𝑖 𝑐! 𝐡𝑖, 1 : 𝐡𝑖 = 𝐡𝑐 (8) Next, we illustrate our point using the dataset presented in Table 1. The msets and the corresponding number of transactions constituting the six datasets are: I = {((i1, ), 1), ((i2, ), 1), ((i3, ), 1), ((i4, ), 1), ((i5, ), 1), ((i6, ), 2), ((i7, ), 1), ((i8, ), 1), ((i9, ), 1)} I = =

5 C = {((c1, ), 1), ((c2, ), 1), ((c3, ), 1), ((c4, ), 1), ((c5, ), 1), ((c6, ), 1), ((c7, ), 1), ((c8, ), 1)} C = = 8 P = {((p1, ), 2), ((p2, ), 2), ((p3, ), 1), ((p4, ), 1), ((p5, ), 1), ((p6, ), 1), ((p7, ), 1),((p8, ), 2), ((p9, ), 2), ((p10, ), 1), ((p11, ), 1), ((p12, ), 1), ((p13, ), 1), ((p14, ), 1)} P = = 18 Q = {((q1, ), 1), ((q2, ), 2), ((q3, ), 1), ((q4, ), 1), ((q5, ), 1), ((q6, ), 1), ((q7, ), 1), ((q8, ), 2), ((q9, ), 2), ((q10, ), 1), ((q11, ), 1), ((q12, ), 1), ((q13, ), 1), ((q14, ), 1)} Q = = 17 R = {((r1, ), 1), ((r2, ), 1), ((r3, ), 1), ((r4, ), 1), ((r5, ), 1), ((r6, ), 1)} R = = 6 S = {((s1, ), 1), ((s2, ), 1), ((s3, ), 1), ((s4, ), 1)} S = = 4 I, C, P, Q, R and S are the numbers of transactions, constituting each dataset in I, C, P, Q, R and S. It is computed by adding all the multiplicities of each element in the given dataset. The above examples show that P always results in the most number of transactions and S always results in the least number of transactions among the datasets. TABLE 2. MYLYN PROJECT INTERACTION AND COMMMIT HISTORIES FOR THE PERIOD OF JUNE 18, 2007 TO JULY 01, Parameters Interaction Commit 3272 traces 5093 revisions File Methods Files Methods Transactions Max. elements/transaction Min. elements/transaction Avg. elements per transaction Associated with bug Id All All F. Prediction Models We detected evolutionary couplings from the six groups of datasets in Section II.E using association rule. The association rules forms our coupling-based prediction models. We shall refer to two individual models, corresponding to datasets I (1) and C (2), as Interaction Model (IM) and Commit Model (CM) respectively. And the four combined models are referred as CpM, CqM, CrM and CsM corresponding to datasets from P (3), Q (4), R (5) and S (6) respectively (see Section II.E). III. EMPIRICAL EVALUATION In the empirical study, we investigated how well our combined approaches for evolutionary couplings performed in predicting future interactions and commits, IP and CP. We are simulating the prospective of a developer who is interacting within an IDE to implement (and commits code related to) a change request. The performance is assessed using two quantitative metrics from information retrieval. A. Research Questions We addressed the following research questions (RQ) in our case study: RQ1. How do interaction and change based evolutionary couplings trained from datasets I and C respectively perform for IP and CP, and compare with each other? RQ2. How the prediction models trained by the combined datasets perform for IP and CP, and compare with individual models from interactions and commits? RQ3. How much is the performance difference of the different combined models for IP and CP? B. Subject Software System The empirical evaluation requires adequate amount of both interaction and commit history. We focused our evaluation on the Mylyn project, which contains about 4 years of interaction data. It is an Eclipse Foundation project with the most number of interaction history attachments. It is mandatory for Mylyn project committers to use the Mylyn plug-in 5. This fact explains in a way that there is more interaction data for the Mylyn project than other Eclipse Foundation projects. Mylyn does not have interaction data for its entire lifetime. Commit history started 2 years prior to that of interaction, and commits to the Mylyn CVS repository terminated on July 01, To get both interaction and commit histories within the same period, we considered the history between June 18, 2007 (the first day of interaction history attachment) and July 01, 2011 (the last day of commit to the Mylyn CVS repository). 1) Interaction Dataset: The Mylyn project consists of 2275 bug issues containing 3272 interaction trace files. About 1721 (76% of) bug issues are associated with only one trace file. After preprocessing the traces and filtering out noises, 2357 file-level and 2174 method-level transactions were identified. Table 2 provides information about the file and method levels of interaction transactions for the Mylyn project. There are more file-level interaction transactions than methodlevel interaction transactions. This difference may be due to the fact that Mylyn propagates lower-level interaction events in to their parents. An event that took place on a method, for instance, also affects the encompassing class, which in turn affects the encompassing file, package, and so on. The average number of files per transaction is greater than the average number of methods per transaction, which could be the result of interaction to more than one method in a single file. 2) Commit Dataset: The Mylyn project contains 5093 revision histories. Out of 5093 change sets, 3727 revisions contain at least a change to one Java file and 2058 revisions contain at least one Java method. About 3572 (96% of) filelevel changes and 1947 (95% of) method-level changes are associated with bug issues. According to Table 2, there are more numbers of commit transactions than interaction transactions. This difference could happen if programmers use a single Mylyn task for more than one commit, or they might forget to create Mylyn tasks for every change they made, as required by the Mylyn committers guideline. Interaction transactions are larger in size than commit transactions. That is, programmers typically interact with more entities than they change

6 C. Training and Testing Sets Both interaction and commit datasets are split into two groups: training and testing sets. The training sets are used to mine association rules, i.e., evolutionary couplings, and the testing sets are used to measure the effectiveness of the rules for IP and CP. We used the first 75% of the transactions for the training set and the next 25% of the transactions for the testing set. We have two individual models based on interactions and commits alone. Also, there are four combined models, which are based on different ways of combining interactions and commits from the same period of history (see Section II), and at two levels of granularity. Therefore, we have a total of 12 models: 6 each at the file and method levels. Corresponding to these models, we have a total of 12 training sets. Figure 2 shows the number of file and method levels of transactions for the six different groups of training sets. We have two tasks on which our models are evaluated: interaction and commit predictions (IP and CP). Each task is evaluated at two levels of granularity (file and method). Therefore, four testing datasets were produced: two interaction-testing sets (one each for file and method levels) for IP and two commit-testing sets (one each for file and method levels) for CP. For IP, the number of file-level transactions was 589 and the number of method-level transactions was 543. For CP, the number of file-level transactions was 932 and the number of method-level transactions was 514. The 6 models trained at the file level were compared with each of IP and CP file-level testing sets. The 6 models trained at the method level were compared with each of IP and CP method-level testing sets. D. Performance Metrics To evaluate the accuracy of the six prediction models, for all the transactions in the testing set, we used two popular measures from information retrieval: precision and recall [25]. Precision is the proportion of predicted files/methods that are correct, which is formulated as follows: TP Precision(p) = TP + FP (9) Recall is the proportion of actual files/methods predicted correctly, which is formulated as follows: TP Recall(r) = TP + FN (10) Where, TP (true positives) predicted files/methods that are relevant, FP (false positives) predicted files/methods that are not relevant, and FN (false negatives) relevant files/methods that are not predicted. For each transaction in the testing sets, we determined the first file/method to be interacted with (for IP) or file/method to be changed (for CP). Mylyn records the time stamp of each interaction event, so we used this value to determine the first file/method to be interacted with in each transaction in the interaction testing dataset. A commit transaction; however, does not identify the file/method that was changed first. Consequently, we make predictions assuming each file/method has an equal chance of being changed first. In this case, the precision and recall values for a CP become the average of all the predictions considering each element in a commit transaction as a starting point for the change. Figure 2. Number of transactions in the training sets of the models. If a model does not make predictions, both precision and recall metrics become undefined. To overcome this scenario, we imposed one rule on the formula. If a model did not make any predictions, we did not compute precision and recall. To include this exception scenario in our metric, we report the probability of a model in predicting IP and CP (regardless of whether the predictions are correct or not). This probability is termed likelihood and it is given by the formula [17]: Total Prediction Likelihood(l) = No. of Test Cases (11) Unlike precision and recall, a testing set will only have a single value for likelihood. E. Hypotheses Testing We derived testable hypotheses to evaluate our research questions. We only list the null hypotheses because one can easily derive the alternative hypotheses from them. H 0-1 : There is no difference among the precision values of the six models for file-level interaction predictions. H 0-2 : There is no difference among the recall values of the six models for file-level interaction predictions. H 0-3 : There is no difference among the precision values of the six models for method-level interaction predictions. H 0-4 : There is no difference among the recall values of the six models for method-level interaction predictions. H 0-5 : There is no difference among the precision values of the six models for file-level commit predictions. H 0-6 : There is no difference among the recall values of the six models for file-level commit predictions. H 0-7 : There is no difference among the precision values of the six models for method-level commit predictions. H 0-8 : There is no difference among the recall values of the six models for method-level commit predictions. Each pair of the above precision and recall hypotheses corresponds to one of the testing sets. For example, the interaction-testing set at the file level is used for hypotheses H 0-1 and H 0-2 and the one at the method level is used for hypotheses H 0-3 and H 0-4. Likewise, the commit-testing set at the file level is used for hypotheses H 0-5 and H 0-6 and the one at the method level is used for hypotheses H 0-7 and H 0-8. For each hypothesis, we compared the 6 models on one testing set at a 316

7 time and did not compare hypotheses and results on different testing sets against each other. Note that, we mean statistically significant difference in the stated null hypotheses. To analyze the differences between the values reported by each model, we computed the average values of precision and recall for each support threshold. The precision and recall values are compared using a precisionrecall curve. We performed the analysis of variance (ANOVA) test with α=0.05 to validate whether there is a statistically significant difference between the models. F. Evaluation Results Figure 3 and Figure 4 show the precision and recall curves for IP and CP for the 3 support thresholds (1, 2 and 3) at file(a) and method(b) levels of granularity. Each data point represents the average precision and recall for all the transactions in the testing set. The lines connecting each precision-recall pair at each threshold show the trade-off between precision and recall. Figure 5 shows the outcome of the ANOVA test. Note that the metric values in the charts are reported in fractions. 1) Interaction Prediction (IP): File-level- From Figure 3(a), at the file-level IP, we can see that IM and the combination models resulted in a similar performance with the exception of CM. CqM and CrM achieved the highest recall value with a gain of 3% as compared to IM, with a loss of 1% in precision. CM exhibited the highest precision value with a 4% increase compared to IM. However, CM returned the lowest recall value with a 15% decrease as compared to IM. All the combination models, except CsM, achieved a higher likelihood value than IM. CpM achieved the maximum gain of 4% likelihood. Method-level- Figure 3(b) shows the precision-recall curve for method-level IP. Both CM and CrM gained precision over IM. CM showed a 13% increase while losing 5% in recall, whereas CrM showed a 4% increase without any loss in recall. CrM showed a 2% recall increase without any loss in precision. Both CpM and CqM exhibited a 2% increase in likelihood. Usually, an increase in the minimum support should increase precision and decrease recall; however, CM exhibited a decrease in precision at the support of 3 and recall was the same across the support values for the method-level IP. This exception shows that the coupling is perhaps stronger in committed elements than interacted ones, i.e., the consequences of not applying the required co-changes are more severe than co-interactions. For the method-level IP, CpM and CqM resulted in almost identical performance for precision and recall. From equation (5) and equation (6), we can see that the number of transactions in the training datasets for CpM and CqM are practically very close. In the example in Section II.E, the cardinalities of P and Q are 18 and 17. This similarity in the training datasets of CpM and CqM resulted in the same performance across the three support values for the methodlevel IP and two out of the three support values for the filelevel IP. From Figure 5, we can see that there are statistically significant differences between CM and the other models in terms of precision at both file and method levels. Therefore, we reject H 0-1 and H 0-3. There are also statistically significant differences between CM and the other models in terms of recall at the file-level granularity; however, there is none at the method level. Therefore, we reject H 0-2 and accept H ) Commit Prediction (CP): File-level- From Figure 4(a), for the file-level CP, the combined models did not show any improvements in precision as compared to CM. However, CqM gained 13% in recall at the loss of 2% in precision. CrM also showed a 10% increase in recall with a 1% loss in precision. All the combination models achieved a higher likelihood value than CM. CpM achieved a maximum gain of 15% likelihood. Method-level- Figure 4(b) shows the precision-recall curve for the method-level CP. In terms of precision, the other models did not perform as well as CM. However, CrM displayed a 4% increase in recall with a 13% decrease in precision. All the combination models achieved a higher likelihood value than CM. CpM achieved a maximum gain of 14% likelihood. CM exhibited an increase in recall at the support of 3 for the method-level CP. From Figure 2, the numbers of file and method level transactions for CM are much larger than the respective numbers of transactions for CsM. In the example provided in Section II.E, the cardinalities of C and S are 8 and 4. Despite this big difference in the numbers of training transactions, CsM resulted in a higher recall value for both file and method level IP and CP. This fact indicates that coupling is determined not only by the number of transactions in the training history but also the number of elements per transaction. From equation (8), the training dataset for CsM includes some transactions from the interaction dataset, and interaction transactions contain larger number of elements per transaction than commit transactions, which resulted in a higher recall for CsM than CM. From Figure 5, we can see that there are statistically significant differences between CM and the other models in terms of precision at both file and method levels. There are statistically significant differences in precision between IM and each of CpM, CqM and CrM at the method-level CP. Also, statistically significant differences were observed between CpM and each of CsM, CqM and CsM in terms of precision at the method-level CP. Therefore, we reject H 0-5 and H 0-7. There are also statistically significant differences between CM and each of CpM, CqM and CrM in recall at the file level. Therefore, we reject H 0-6 and accept H 0-8. G. Answering Research Questions (RQs) To answer the research questions, the performances of the different prediction models for predicting each of the four testing sets were examined. In answering RQ1, the average performances of IM and CM for IP and CP across the three support thresholds were compared at the file and method levels of granularity. For the file-level IP, IM performed better than CM with 16% and 18% improvements in recall and likelihood respectively. However, IM exhibited a 3% decrease in precision. For the method-level IP, CM performed better than IM with an 11% increase in precision; however, at a loss of 4% in recall. For the file-level CP, IM outperformed CM with a gain of 8% in recall and 9% in likelihood; however, at a loss of 2% in precision. For the method-level CP, CM outperformed IM with a gain of 24% in precision and 8% in recall; however, at a loss of 9% in likelihood. In both cases, CM is better in precision and IM is better in recall except method-level CP. 317

8 Figure 3. Precision vs. recall curves of the different prediction models for detecting IP for Mylyn project at minimum support of 1, 2 and 3. The combination models did not register a promising result for IP. The maximum performance gains by the combined models are a 4% gain in both precision and likelihood for file and method level IP respectively. The combination models; however, showed a promising result for file-level CP displaying an 11% and 15% increase in recall and likelihood with a 1% trade off in precision. For the method-level CP, CM performed better than combination models because it exhibited 19% and 1% gains in precision and recall respectively with a loss of 14% in likelihood. The maximum performance gain of combination models was 4% in both precision and likelihood for file and method IPs. The combination models showed promising results for the file-level CP. In answering RQ3, the average performances of the combined models across the three thresholds were compared for the file and method levels of IP and CP. For the file-level IP, the difference in precision of the four combined models is 1%, and the differences in recall are between 1% and 2%. For the method-level IP, the differences in precision of the four combined models are between 1% and 2%, and the difference in recall is at most 3%. For the file-level CP, the difference in precision of the four combined models is 1%, and the differences in recall are between 1% and 5%. For the methodlevel CP, the differences in precision of the four combined models are between 1% and 2%, and the difference in recall is at most 2%. Significant differences are observed between CpM and CsM, and between CqM and CsM, for the precision of the method-level CP. Overall, CpM and CqM are better in recall and likelihood, and CrM and CsM are better in precision. Figure 5. A heat-map summarizing hypotheses test results across all the prediction models for the minimum support of 1. Signficant difference: Cells colored black exists for both method and file levels; dark-gray exists for only the file level; light-gray exists for only the method level; white none for both method and file levels. Figure 4. Precision vs. recall curves of the individual and combined models for CP on themylyn dataset at minimum supports of 1, 2 and 3. Parts A and B for files and methods. In answering RQ2, the average performances of the combined models across the three thresholds were compared with the average performances of IM for IP and with the average performances of CM for CP. For the file-level IP, the combination models outperformed IM with 1%, 1%, and 4% gain in precision, recall, and likelihood respectively. For the method-level IP, combination models outperformed IM with a gain of 4% in precision, 1% in recall, and 2% in likelihood. IV. THREATS TO VALIDITY We discuss internal, construct, and external threats to validity of the results of our empirical study. Incomplete or Missing Interaction History: Although, a common period was considered for extracting the interaction and commit datasets in the Mylyn dataset, the number of commit transactions is significantly higher than the number of interaction transactions. This difference may not be the result of a single task getting defined for multiple commits because there are many cases in which committed files have never been part of one of the corresponding interaction transactions. Data Extraction Errors: We used two adequately vetted tools to extract method-level interaction and commit transactions; however, it is possible that the unforeseen error rates between the two tools might have been different. 318

9 CVS to SVN Conversion: We do not know the error rate of CVS2SVN when grouping individual CVS files into change sets. It may erroneously split a commit into multiple, or group multiple commits into one. There are 1366 more commits than the number of interaction traces for the same period. This difference could be due to the errors introduced by CVS2SVN. Explicit Bug Id Linkage: We considered interactions and commits to be related if there was an explicit bug id mentioned in them. Other implicit relationships were not considered. Training and Testing Set Split: We considered only a 75%:25% split between training and testing sets. It is possible that a different split point could produce different results. Single Period of History: We considered only the history between June 18, 2007 and July 01, It is possible that this history is not reflective of the optimum results for all the models. A different history period might produce different results in terms of their relative performance. Performance Metrics: We considered precision and recall metrics for the evaluation. One could also use other derived metrics such as F-measure; however, we wanted to analyze the performance differences with multiple orthogonal metrics. Only One System Considered: Due to the lack of adequate Mylyn interaction histories for open source projects, our validation study was performed only on a single system written in Java. It was the one with the largest available dataset within Eclipse Foundation. It had over 2600 fixed bug reports that contained at least one interaction trace attachment. The second and third largest projects (the Eclipse Platform and Modeling) had about 700 and 450 such bugs. Nonetheless, this fact may limit the generality of our results. V. RELATED WORK We discuss related evolutionary coupling mining approaches. Our goal is not to exhaustively detail this large body of literature, but to briefly discuss a few representatives. A. Evolutionary Couplings from Programmers Interactions There are a number of research efforts that used interaction information to mine evolutionary couplings. Researchers have been developing IDE plug-ins to capture programmers interactions during programming activities [10, 11, 21]. NavTracks, a complementary tool to the Eclipse package Explorer, keeps track of the navigation history of software developers. The tool provides information concerning the recent actions of a programmer on a local copy of a development project. The information was used to mine IC at the file-level granularity [11]. Team Track [10] also records programmers interactions to projects, files, classes, and members by continuously tracking the position of the mouse cursor at every second. The information was then used to provide navigation support to programmers unfamiliar with the code base. In HeatMaps [21], the interestingness of a programming element is determined by computing a Degreeof-Interest (DOI) value based on the historical selection and modification of it. If an artifact is found interesting, it is decorated with colors to indicate its importance to the task. Zou et al. [13] used the interaction history to identify evolutionary information about a development process, such as restructuring is more costly than other maintenance activities. Robbes et al. [7] developed an incremental change based repository by retrieving the program information from an IDE, which includes more information about the evolution of a system than traditional SCM, to identify a refactoring event. Parnin and Gorg [26] identified relevant methods for the current task by using programmers interactions with an IDE. Kobayashi et al. [15] presented a Change Guide Graph (CGC) based on interaction information to guide programmers to the location of the next change. Each node in the graph presents a changed artifact and each edge presents a relation between consecutive changes. Following the CGC graph, the next target in the change sequence can be identified. Logical couplings have also been detected by combining interaction history with other sources of information about a program. Schneider et al. [19] presented a visual tool for mining local interaction histories to help address some of the awareness problems experienced in distributed software development projects. Both interaction history and static dependencies were used to provide a set of potentially interesting elements to a programming task. Change histories from SCM, such as CVS and SVN, do not track sequence of edits in a change set. Robbes et al. [4] proposed an alternative approach to predict sequential change couplings by recorded programmers activities in the IDE. They used the data to evaluate existing change prediction approaches. B. Evolutionary Couplings from Commits Ying et al. [9] used an association rule mining algorithms to mine evolutionary couplings from commits. They also provide the interestingness value for each recommendation, which indicates the surprise factor, i.e., entities that are not apparent to a developer by their primitive knowledge of source code. Canfora et al. [27] used both CVS and Bugzila data to perform impact analysis. The method exploits information retrieval algorithms to link the change request description and the set of historical source files in repositories. They use textual similarities to retrieve past change requests (CR) similar to a new CR. Fluri et al. [5] focused on adding the structural change information to release history data. They discarded changes related to textual modifications, such as updates in license terms, because they could indicate false coupling between files. Kagdi et al. [16, 28] provide a model that combines evolutionary couplings with estimated changes identified by traditional impact analysis techniques. Zimmermann et al. [17] presented a tool, namely ROSE, to mine evolutionary couplings from CVS commits. They used a sliding window technique to identify commits and used association rule mining. Similar to Ying et al. [9] and Zimmermann et al.[17], our approach uses association rule mining for evolutionary couplings from commits. Other approaches that use static analysis for impact analysis are discussed in [1, 29, 30] and those that use dynamic analysis are discussed in [2, 31, 32]. Their discussion is out of scope here. C. Comparison of Our Approach with Existing Approaches From the above discussion, it can be seen that none of the approaches used combinations of interaction and commit histories for IP and CP. We presented four combined models of commit and interaction histories at file and method levels. Also, we mined CC and IC from each of these combined data sets. We performed two different empirical comparisons: one 319

Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study

Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study Francisco Zigmund Sokol 1, Mauricio Finavaro Aniche 1, Marco AurΓ©lio Gerosa 1 1 Department of Computer Science University of SΓ£o

More information

Recommending change clusters to support software investigation: an empirical study

Recommending change clusters to support software investigation: an empirical study JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE J. Softw. Maint. Evol.: Res. Pract. 2010; 22:143 164 Published online 9 September 2009 in Wiley InterScience (www.interscience.wiley.com)..413

More information

Software Quality Exercise 2

Software Quality Exercise 2 Software Quality Exercise 2 Testing and Debugging 1 Information 1.1 Dates Release: 12.03.2012 12.15pm Deadline: 19.03.2012 12.15pm Discussion: 26.03.2012 1.2 Formalities Please submit your solution as

More information

Integrated Impact Analysis for Managing Software Changes

Integrated Impact Analysis for Managing Software Changes Integrated Impact Analysis for Managing Software Changes Malcom Gethers 1, Bogdan Dit 1, Huzefa Kagdi 2, Denys Poshyvanyk 1 1 Computer Science Department 2 Department of Computer Science The College of

More information

Mining a Change-Based Software Repository

Mining a Change-Based Software Repository Mining a Change-Based Software Repository Romain Robbes Faculty of Informatics University of Lugano, Switzerland 1 Introduction The nature of information found in software repositories determines what

More information

The Importance of Bug Reports and Triaging in Collaborational Design

The Importance of Bug Reports and Triaging in Collaborational Design Categorizing Bugs with Social Networks: A Case Study on Four Open Source Software Communities Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer Chair of Systems Design ETH

More information

Automating the Measurement of Open Source Projects

Automating the Measurement of Open Source Projects Automating the Measurement of Open Source Projects Daniel German Department of Computer Science University of Victoria dmgerman@uvic.ca Audris Mockus Avaya Labs Department of Software Technology Research

More information

Processing and data collection of program structures in open source repositories

Processing and data collection of program structures in open source repositories 1 Processing and data collection of program structures in open source repositories JEAN PETRIĆ, TIHANA GALINAC GRBAC AND MARIO DUBRAVAC, University of Rijeka Software structure analysis with help of network

More information

Using Continuous Code Change Analysis to Understand the Practice of Refactoring

Using Continuous Code Change Analysis to Understand the Practice of Refactoring Using Continuous Code Change Analysis to Understand the Practice of Refactoring Stas Negara, Nicholas Chen, Mohsen Vakilian, Ralph E. Johnson, Danny Dig University of Illinois at Urbana-Champaign Urbana,

More information

An empirical study of fine-grained software modifications

An empirical study of fine-grained software modifications An empirical study of fine-grained software modifications Daniel M. German Software Engineering Group Department of Computer Science University of Victoria Victoria, Canada dmgerman@uvic.ca Abstract Software

More information

Discovering, Reporting, and Fixing Performance Bugs

Discovering, Reporting, and Fixing Performance Bugs Discovering, Reporting, and Fixing Performance Bugs Adrian Nistor 1, Tian Jiang 2, and Lin Tan 2 1 University of Illinois at Urbana-Champaign, 2 University of Waterloo nistor1@illinois.edu, {t2jiang, lintan}@uwaterloo.ca

More information

White Paper. Software Development Best Practices: Enterprise Code Portal

White Paper. Software Development Best Practices: Enterprise Code Portal White Paper Software Development Best Practices: Enterprise Code Portal An Enterprise Code Portal is an inside the firewall software solution that enables enterprise software development organizations

More information

Improved Software Testing Using McCabe IQ Coverage Analysis

Improved Software Testing Using McCabe IQ Coverage Analysis White Paper Table of Contents Introduction...1 What is Coverage Analysis?...2 The McCabe IQ Approach to Coverage Analysis...3 The Importance of Coverage Analysis...4 Where Coverage Analysis Fits into your

More information

ReLink: Recovering Links between Bugs and Changes

ReLink: Recovering Links between Bugs and Changes ReLink: Recovering Links between Bugs and Changes Rongxin Wu, Hongyu Zhang, Sunghun Kim and S.C. Cheung School of Software, Tsinghua University Beijing 100084, China wrx09@mails.tsinghua.edu.cn, hongyu@tsinghua.edu.cn

More information

Sample Workshops - An Overview of Software Development Practices

Sample Workshops - An Overview of Software Development Practices Report on MSR 2004: International Workshop on Mining Software Repositories Ahmed E. Hassan and Richard C. Holt Software Architecture Group (SWAG) School of Computer Science University of Waterloo Waterloo,

More information

Empirical study of software quality evolution in open source projects using agile practices

Empirical study of software quality evolution in open source projects using agile practices 1 Empirical study of software quality evolution in open source projects using agile practices Alessandro Murgia 1, Giulio Concas 1, Sandro Pinna 1, Roberto Tonelli 1, Ivana Turnu 1, SUMMARY. 1 Dept. Of

More information

Mining and Tracking Evolving Web User Trends from Large Web Server Logs

Mining and Tracking Evolving Web User Trends from Large Web Server Logs Mining and Tracking Evolving Web User Trends from Large Web Server Logs Basheer Hawwash and Olfa Nasraoui* Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data

PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data Ryhan Pathan Department of Electrical Engineering and Computer Science University of Tennessee Knoxville Knoxville,

More information

Nick Ashley TOOLS. The following table lists some additional and possibly more unusual tools used in this paper.

Nick Ashley TOOLS. The following table lists some additional and possibly more unusual tools used in this paper. TAKING CONTROL OF YOUR DATABASE DEVELOPMENT Nick Ashley While language-oriented toolsets become more advanced the range of development and deployment tools for databases remains primitive. How often is

More information

Clone Detection in a Product Line Context

Clone Detection in a Product Line Context Clone Detection in a Product Line Context Thilo Mende, Felix Beckwermert University of Bremen, Germany {tmende,beckwermert}@informatik.uni-bremen.de Abstract: Software Product Lines (SPL) can be used to

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

The Analysis of Online Communities using Interactive Content-based Social Networks

The Analysis of Online Communities using Interactive Content-based Social Networks The Analysis of Online Communities using Interactive Content-based Social Networks Anatoliy Gruzd Graduate School of Library and Information Science, University of Illinois at Urbana- Champaign, agruzd2@uiuc.edu

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases Paul L. Bergstein, Priyanka Gariba, Vaibhavi Pisolkar, and Sheetal Subbanwad Dept. of Computer and Information Science,

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Testing Metrics. Introduction

Testing Metrics. Introduction Introduction Why Measure? What to Measure? It is often said that if something cannot be measured, it cannot be managed or improved. There is immense value in measurement, but you should always make sure

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado KeΕ‘elj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

Software Configuration Management Plan

Software Configuration Management Plan For Database Applications Document ID: Version: 2.0c Planning Installation & Acceptance Integration & Test Requirements Definition Design Development 1 / 22 Copyright 2000-2005 Digital Publications LLC.

More information

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen

More information

VISUALIZATION APPROACH FOR SOFTWARE PROJECTS

VISUALIZATION APPROACH FOR SOFTWARE PROJECTS Canadian Journal of Pure and Applied Sciences Vol. 9, No. 2, pp. 3431-3439, June 2015 Online ISSN: 1920-3853; Print ISSN: 1715-9997 Available online at www.cjpas.net VISUALIZATION APPROACH FOR SOFTWARE

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs CΓ©sar Couto 1,2, Pedro Pires 1, Marco TΓΊlio Valente 1, Roberto S. Bigonha 1, Andre Hora 3, Nicolas Anquetil 3 1 Department

More information

Theme 1 Software Processes. Software Configuration Management

Theme 1 Software Processes. Software Configuration Management Theme 1 Software Processes Software Configuration Management 1 Roadmap Software Configuration Management Software configuration management goals SCM Activities Configuration Management Plans Configuration

More information

Configuration Management Models in Commercial Environments

Configuration Management Models in Commercial Environments Technical Report CMU/SEI-91-TR-7 ESD-9-TR-7 Configuration Management Models in Commercial Environments Peter H. Feiler March 1991 Technical Report CMU/SEI-91-TR-7 ESD-91-TR-7 March 1991 Configuration Management

More information

Bayesian Spam Filtering

Bayesian Spam Filtering Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

More information

Private Record Linkage with Bloom Filters

Private Record Linkage with Bloom Filters To appear in: Proceedings of Statistics Canada Symposium 2010 Social Statistics: The Interplay among Censuses, Surveys and Administrative Data Private Record Linkage with Bloom Filters Rainer Schnell,

More information

A Visualization Approach for Bug Reports in Software Systems

A Visualization Approach for Bug Reports in Software Systems , pp. 37-46 http://dx.doi.org/10.14257/ijseia.2014.8.10.04 A Visualization Approach for Bug Reports in Software Systems Maen Hammad 1, Somia Abufakher 2 and Mustafa Hammad 3 1, 2 Department of Software

More information

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,

More information

Are Suggestions of Coupled File Changes Interesting?

Are Suggestions of Coupled File Changes Interesting? Are Suggestions of Coupled File Changes Interesting? Jasmin Ramadani 1, Stefan Wagner 1 1 University of Stuttgart jasmin.ramadani, stefan.wagner@informatik.uni-stuttgart.de Keywords: Abstract: Data Mining,

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics

Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics Empirical study of Software Quality Evaluation in Agile Methodology Using Traditional Metrics Kumi Jinzenji NTT Software Innovation Canter NTT Corporation Tokyo, Japan jinzenji.kumi@lab.ntt.co.jp Takashi

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Baseline Code Analysis Using McCabe IQ

Baseline Code Analysis Using McCabe IQ White Paper Table of Contents What is Baseline Code Analysis?.....2 Importance of Baseline Code Analysis...2 The Objectives of Baseline Code Analysis...4 Best Practices for Baseline Code Analysis...4 Challenges

More information

SOFTWARE TESTING TRAINING COURSES CONTENTS

SOFTWARE TESTING TRAINING COURSES CONTENTS SOFTWARE TESTING TRAINING COURSES CONTENTS 1 Unit I Description Objectves Duration Contents Software Testing Fundamentals and Best Practices This training course will give basic understanding on software

More information

Product Line Development - Seite 8/42 Strategy

Product Line Development - Seite 8/42 Strategy Controlling Software Product Line Evolution An infrastructure on top of configuration management Michalis Anastasopoulos michalis.anastasopoulos@iese.fraunhofer.de Outline Foundations Problem Statement

More information

Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at http://www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2007 Vol. 6, No. 1, January-February 2007 CM Configuration Change Management John D.

More information

sql-schema-comparer: Support of Multi-Language Refactoring with Relational Databases

sql-schema-comparer: Support of Multi-Language Refactoring with Relational Databases sql-schema-comparer: Support of Multi-Language Refactoring with Relational Databases Hagen Schink Institute of Technical and Business Information Systems Otto-von-Guericke-University Magdeburg, Germany

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

tools that make every developer a quality expert

tools that make every developer a quality expert tools that make every developer a quality expert Google: www.google.com Copyright 2006-2010, Google,Inc.. All rights are reserved. Google is a registered trademark of Google, Inc. and CodePro AnalytiX

More information

Data Collection from Open Source Software Repositories

Data Collection from Open Source Software Repositories Data Collection from Open Source Software Repositories GORAN MAUΕ A, TIHANA GALINAC GRBAC SEIP LABORATORY FACULTY OF ENGINEERING UNIVERSITY OF RIJEKA, CROATIA Software Defect Prediction (SDP) Aim: Focus

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool UNIVERSITY OF WATERLOO Software Engineering Analysis of Different High-Level Interface Options for the Automation Messaging Tool Deloitte Inc. Toronto, ON M5K 1B9 Prepared By Matthew Stephan Student ID:

More information

Surround SCM Best Practices

Surround SCM Best Practices Surround SCM Best Practices This document addresses some of the common activities in Surround SCM and offers best practices for each. These best practices are designed with Surround SCM users in mind,

More information

On Correlating Performance Metrics

On Correlating Performance Metrics On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

More information

SOFTWARE PROCESS MINING

SOFTWARE PROCESS MINING SOFTWARE PROCESS MINING DR. VLADIMIR RUBIN LEAD IT ARCHITECT & CONSULTANT @ DR. RUBIN IT CONSULTING LEAD RESEARCH FELLOW @ PAIS LAB / HSE ANNOTATION Nowadays, in the era of social, mobile and cloud computing,

More information

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar

More information

JRefleX: Towards Supporting Small Student Software Teams

JRefleX: Towards Supporting Small Student Software Teams JRefleX: Towards Supporting Small Student Software Teams Kenny Wong, Warren Blanchet, Ying Liu, Curtis Schofield, Eleni Stroulia, Zhenchang Xing Department of Computing Science University of Alberta {kenw,blanchet,yingl,schofiel,stroulia,xing}@cs.ualberta.ca

More information

An Oracle White Paper September 2011. Oracle Team Productivity Center

An Oracle White Paper September 2011. Oracle Team Productivity Center Oracle Team Productivity Center Overview An Oracle White Paper September 2011 Oracle Team Productivity Center Overview Oracle Team Productivity Center Overview Introduction... 1 Installation... 2 Architecture...

More information

CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation:

CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation: This document contains information exempt from mandatory disclosure under the FOIA. Exemptions 2 and 4 apply. CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation: Coverity Prevent 2.4.0 Fortify

More information

The Real Challenges of Configuration Management

The Real Challenges of Configuration Management The Real Challenges of Configuration Management McCabe & Associates Table of Contents The Real Challenges of CM 3 Introduction 3 Parallel Development 3 Maintaining Multiple Releases 3 Rapid Development

More information

CS229 Project Report Automated Stock Trading Using Machine Learning Algorithms

CS229 Project Report Automated Stock Trading Using Machine Learning Algorithms CS229 roject Report Automated Stock Trading Using Machine Learning Algorithms Tianxin Dai tianxind@stanford.edu Arpan Shah ashah29@stanford.edu Hongxia Zhong hongxia.zhong@stanford.edu 1. Introduction

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur Module 10 Coding and Testing Lesson 26 Debugging, Integration and System Testing Specific Instructional Objectives At the end of this lesson the student would be able to: Explain why debugging is needed.

More information

Oracle Real Time Decisions

Oracle Real Time Decisions A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

More information

Discovering loners and phantoms in commit and issue data

Discovering loners and phantoms in commit and issue data Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2015 Discovering loners and phantoms in commit and issue data Schermann, Gerald;

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-UniversitΓ€t zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-UniversitΓ€t zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-UniversitΓ€t zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

An Experiment on the Effect of Design Recording on Impact Analysis

An Experiment on the Effect of Design Recording on Impact Analysis An Experiment on the Effect of Design Recording on Impact Analysis F. Abbattista, F. Lanubile, G. Mastelloni, and G. Visaggio Dipartimento di Informatica University of Bari, Italy Abstract An experimental

More information

Data Migration Service An Overview

Data Migration Service An Overview Metalogic Systems Pvt Ltd J 1/1, Block EP & GP, Sector V, Salt Lake Electronic Complex, Calcutta 700091 Phones: +91 33 2357-8991 to 8994 Fax: +91 33 2357-8989 Metalogic Systems: Data Migration Services

More information

Characterizing and Predicting Blocking Bugs in Open Source Projects

Characterizing and Predicting Blocking Bugs in Open Source Projects Characterizing and Predicting Blocking Bugs in Open Source Projects Harold Valdivia Garcia and Emad Shihab Department of Software Engineering Rochester Institute of Technology Rochester, NY, USA {hv1710,

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Eventia Log Parsing Editor 1.0 Administration Guide

Eventia Log Parsing Editor 1.0 Administration Guide Eventia Log Parsing Editor 1.0 Administration Guide Revised: November 28, 2007 In This Document Overview page 2 Installation and Supported Platforms page 4 Menus and Main Window page 5 Creating Parsing

More information

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and GΓΌnter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Mining Metrics to Predict Component Failures

Mining Metrics to Predict Component Failures Mining Metrics to Predict Component Failures Nachiappan Nagappan, Microsoft Research Thomas Ball, Microsoft Research Andreas Zeller, Saarland University Overview Introduction Hypothesis and high level

More information

Software Configuration Management. Context. Learning Objectives

Software Configuration Management. Context. Learning Objectives Software Configuration Management Wolfgang Emmerich Professor of Distributed Computing University College London http://sse.cs.ucl.ac.uk Context Requirements Inception Elaboration Construction Transition

More information

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and GΓΌnter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

WIRD AG Solution Proposal Project- & Portfolio-Management

WIRD AG Solution Proposal Project- & Portfolio-Management WIRD AG Solution Proposal Project- & Portfolio-Management Overview In order to address the need to control resources, time and cost in projects and in order to develop applications for System z, Wird AG,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department

More information

Creating Short-term Stockmarket Trading Strategies using Artificial Neural Networks: A Case Study

Creating Short-term Stockmarket Trading Strategies using Artificial Neural Networks: A Case Study Creating Short-term Stockmarket Trading Strategies using Artificial Neural Networks: A Case Study Bruce Vanstone, Tobias Hahn Abstract Developing short-term stockmarket trading systems is a complex process,

More information

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR CERN-ATS-2011-213 THE SOFTWARE IMPROVEMENT PROCESS - TOOLS AND RULES TO ENCOURAGE QUALITY K. Sigerud, V. Baggiolini, CERN,

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ MΓ‘tΓ© CserΓ©p a, DΓ‘niel Krupp b a EΓΆtvΓΆs LorΓ‘nd University mcserep@caesar.elte.hu

More information

OMBEA Response. User Guide ver. 1.4.0

OMBEA Response. User Guide ver. 1.4.0 OMBEA Response User Guide ver. 1.4.0 OMBEA Response User Guide Thank you for choosing an Audience Response System from OMBEA. Please visit www.ombea.com to get the latest updates for your system. Copyright

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel ViΓ±a Departament of Information and Comunications Technologies Facultad de InformΓ‘tica,

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

A Business Process Driven Approach for Generating Software Modules

A Business Process Driven Approach for Generating Software Modules A Business Process Driven Approach for Generating Software Modules Xulin Zhao, Ying Zou Dept. of Electrical and Computer Engineering, Queen s University, Kingston, ON, Canada SUMMARY Business processes

More information