Using Decay Mechanism to Improve Regression Test Selection Techniques in Continuous Integration Development Environment

Transcription

1 Using Decay Mechanism to Improve Regression Test Selection Techniques in Continuous Integration Development Environment Jingjing Liang Dept. of Computer Science & Engineering University of Nebraska-Lincoln Lincoln, NE ABSTRACT In continuous integration development environment, the changed code are committed and merged at a frequent interval. This approach benefits us a lot by reducing a lot of repetitive manual work and speeding up the overall development time. To make sure that those changes won t break the integration build, it is necessary to test the changed code prior to the submission to detect as many faults as possible. In this work, by focusing on presubmit testing phase, we present a new regression test selection (RTS) technique which adopts decay mechanism to select a subset of test suites for execution. Upon the assumption that the recent change of a data stream could provide more valuable information for the analysis of newly changed code, and the effect of old test suites on the current testing result is diminishing, decay based RTS relies more on the recent history of the test suites records. To evaluate the performance of the technique, we then conducted an empirical study on Google Shared Dataset of Test Suite Result (GSDTSR) to simulate the industrial process of testing, and the result shows, our new technique could detect a high percentage of failures by executing a very low percentage of test suites. General Terms Reliability, Experimentation Keywords Recent frequent failed tests, Regression test selection, Decay mechanism, Continuous Integration development environment, Google Shared Dataset 1. INTRODUCTION In continuous integration (CI) development environment, developers will commit and merge the code at a frequent interval. The merged code is then regression tested to ensure that their changes won t break the integration build and their codebase remains stable [3, 5]. This CI technique benefits us a lot by Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. reducing repetitive manual work and speeding up the overall development time. Even CI process has a lot of advantages, it also faces several challenges. For example, In order to achieve the benefits from CI, developers must commit code more frequently than it was before. The potential errors or collisions, which may result in the broken integration builds, must be detected as soon as possible. The broken builds must be fixed immediately. Notification systems are needed to improve coordination and reduce conflicts. Tests should be executed with automated build [5]. In addition to the previous challenges, cost effectiveness is another important aspect for development under CI environment. For example, in order to release of products rapidly, code is merged more frequently, which requires faster feedbacks from each builds. To solve this problem, many large organizations have two testing phases [5]. (1) Post-submit testing phase. After developers have submitted their code into the version control repository, the CI sever will build it and run the related script to test it. In this phase, developers will provide a change list along with the changed code. In the change list, they will indicate the modules that are directly relevant the building or testing, so that they could restrict the number of executed test suites. In this phase, we are focusing on detecting failures as soon as possible, so that the developers could fix those problems sooner. (2) Pre-submit testing phase. Even though post-submit testing phase has already restrict the number of test suites. The changed code may still need to deal with a large amount of dependent code and modules. So there may also exists a high possibility that the build will fail. To solve this problem, it is necessary for the developers to test their code prior submission. In this phase, we are focusing on detecting failures as many as possible, so as to reduce the possibility of failing the build after submission. In addition, as an important part of a build, regression testing also should be conducted cost effectively. Regression testing is the process of testing existing software applications to make sure that a change or an addition won t break any existing functionality. Under CI development environment, regression testing is quite necessary. As one of regression testing technique, Regression Test Selection (RTC) aims at selecting part of test suite for execution should be more cost effective. However, traditional RTC is not applicable: Traditional techniques analysis relies on code instrumentation and is applicable only to discrete

2 and complete set of test suites [5]. However, in CI, testing requests arrivals at frequent intervals, which will cause the analysis very expensive. And the frequent coming test suites make it hard to apply to the discrete and complete set of test suites. Therefore, in order to make testing cost effective in pre-submit testing phase, we provide a new approach to regression select a subset of test suites. In CI, when testing, large number of test suites will arrive very frequently. Consequently, the knowledge embedded in the data stream is more likely to be changed as time goes by and the more recent change of a data stream could provide more valuable information for the analysis of the incoming test suites selection. For example, for a certain test suite A, if it frequently fails in recent version of code, then it is more likely to indicate the potential problems in the corresponding changed code modules. However, for another test suite B, if it rarely fails recently, then it is more likely to indicate the good function of the corresponding code. For A and B, even A always pass in the previous version or B always fail in the previous version, we care more about their recent performance. And as time goes by, the effect of old fail/pass status of the test suites gradually decreases and the effect of recent fail/pass status of the test suites gradually increases. Upon the assumption that the recent change of a data stream could provide more valuable information for the analysis of newly changed code, and the effect of old test suites on the current testing result is diminishing, this work present a new regression test selection (RTS) technique which adopts decay mechanism to select a subset of test suites for execution. To evaluate the performance of the technique, we then conducted an empirical study on Google Shared Dataset of Test Suite Result (GSDTSR) to simulate the industrial process of testing, and the result shows, our new technique could detect a high percentage of failures by executing a very low percentage of test suites. Thus, our techniques contribute directly to the goal of the continuous integration process. The remainder of this paper is organized as follows. Section 2 provides backgrounds and related work. Section 3 presents out new decay based RTS techniques. Section 4 presents the design and results of our study. Section 5 discusses the findings and limits of this paper. Section 7 concludes 2. BACKGROUND AND RELATED WORK This section will provide some background for this paper. Section 1 will discuss the CI environment system. Section 2 will discuss Regression Test Selection and related work. Section 3 will discuss Regression Testing at Google and related work. Section 4 will discuss Decay Mechanism and related work. 2.1 Continuous Integration Development Environment Continuous Integration is a software development practice where the members of a team can integrate their work at a very frequent interval. Each integration is verified by an automated build (including test) to detect integration errors and give feedbacks as quickly as possible. It is found that this approach can lead to significantly reduced integration problems and help a team to develop cohesive software more rapidly. [1] 1. Each developer will commit code to the version control repository. At the same time, the CI server on the integration build machine is polling into this repository to check whether there occur any changes (e.g., every few minutes). 2. As a commit occurs, the CI sever will detect the changed code version. Then the CI sever will retrieve the copy of the changed code from the repository and execute the related build script to test and merge it to the code base. 3.As the build completes, the CI sever will generate a report about the result and inform the developer through the feedback mechanism. 4. The CI server will continue to poll for changes in the version control repository, and repeat the previous steps. Figure 1. Components of a Continuous Integration System CI has a lot of advantages: all of the processes are automated, which can significantly reduce the repetitive manual processes. And he fast feedback on an integration build produced by CI system can improve the quality of a project process and reduce the overall development time. As a result, many organizations are increasingly using Continuous Integration process to improve their development environment and reduce the overall development time so as to provide rapid release of new products. [2, 3, 4, 5] 2.2 Regression Test Selection Under CI development environment, code integration become more frequent that it was before, which make it more necessary for developers to test the functionality of the changed code and guarantee that all the old functions still work well. Regression testing is the process of testing existing software applications to make sure that a change or addition hasn t broken any existing functionality. For example, when a new version of product is going to be released, then the old test suites will still be run against the new version to ensure that all the old capabilities still work. Regression Testing can be classified by the following three techni ques: Figure 2. Regression Testing

3 1.Retest All. This is one of the methods for regression testing in which all the tests in the existing test bucket or suite should be reexecuted. This is very expensive, as it requires huge time and resources. 2. Regression Test Selection (RTC). Instead of re-executing the entire test suite, selecting part of test suite for execution should be more cost effective. 3. Test suites Prioritization (TCP). Prioritize the test suites depending on some criteria. In this paper, we are focusing on pre-submit testing phase to cost effectively detect as many failures as possible. So we choose Regression Test Selection technique, which select a subset of test suites related to the changed code and maximizes the failure detection. Even this technique may miss some test suites that could reveal potential faults, all the related test suites will still be executed in post-submit testing phase, and those faults will be detected then. [5]. There has been some recent work on techniques for testing programs on large farms of test servers or in the cloud [11, 12, 13]. However, those works did not specifically consider continuous integration process or regression testing. Saff and Ernst [7] focus on testing during development effort itself, but not after a set of changes has been completed are scheduled for submission to merge. Yoo et al. [35], also working with Google data sets, describe a search-based approach for using TCP techniques to improve the cost-effectiveness of the testing in pre-submit phase. But their study does not, however, consider the use of RTS techniques. In the work Elbaum et al [5], they worked on Google data sets and improved RTS and TCP techniques by using time window to adjust to the CI development environment. In this paper, they adopted two time windows: a failure window (Wf) and an execution window (We). When the test suite is recently failed (failed within failure window), then the test suites will be selected for execution. At the same, if the test suite hasn t been executed for a long time (outside the execution window), then the test suite will be selected for execution. However in this paper, we adopt decay mechanism to improve RTS technique. 2.3 Decay Mechanism In continuous integration development environment, when testing, a large number of test suites will be executed continuously at a rapid rate. Consequently, the knowledge embedded in the stream of data is likely to be changed as time goes by. Identifying the recent change of a data stream could provide more valuable information for the analysis of newly change code. Since the effect of old test suites (data transaction) on the result of in coming data stream is diminished. However, most of frequency counting algorithms over the data stream [7, 8], like Lossy Counting Algorithm [7], did not differentiate the information of recently generated transactions, which may be no longer useful or possibly invalid at present. [6] In terms of information differentiation, the SWF algorithm [9] uses a sliding window to find frequent itemsets in the fixed number of recent transactions. The sliding window is a sequence of partitions, and each partition maintains a number of transactions. It uses 2 candidate itemsets of all transactions and they are maintained separately, and when the window is advanced, then the oldest itemsets will be disregarded and a new itemset will be generated at the same time. This paper will adopt decay mechanism to improve the traditional regression test selection technique under CI development environment. And this technique will examine each test suites (transaction) one-by-one without any candidate generation. The effect of old test suites (transactions) on the current testing result is diminished by decaying the old occurrence count of each test suites as time goes by. 2.4 Google Shared Dataset of Test Suite Result (GSDTSR) To empirically conduct this work, we cannot access to the real industry development process, so we utilize the records in GSDTSR and simulate the process of a real continuous testing environment. Google Shared Dataset of Test Suite Result (GSDTSR) [15] provides the software testing and analysis community with a sample of 3.5 million of test suites execution results from a fast and large scale of continuous testing infrastructure. The information in this dataset is showing in the following Table 1. Field Test Suite Change Request Stage Status Launch Time Execution Time Size Shard Number Language Description Test suite name Rescaled change request number that lead to the execution of the test suite Testing stage including Pre and Post Test suite execution status: Failed and Passed The time when the test suite was executed Test suite execution time in millisecond The size of the test suite: Small, Medium and Large Shards are needed when test suites are parallelized for execution. Shard number is the number of shards used to execute the test suite. Language of the test suite. Google dataset contains two testing stages: pre-testing and posttesting. When a developer commits his changed module, the developer will test prior to submission. In this phase, the developer will provide a change list, which contains the directly relevant modules to the building and testing. Then the testing request is queued for executing all test suites relevant to the change list. After building, the developer will receive a report about the build and testing. If the pre-submit testing succeeds, the developer will submit the module for post-submit testing. In this phase, module dependency graphs are used to determine the test suites for execution. Modules that are globally relevant to the changed module are all included. All of the test suites relevant to these modules are queued for testing. 3. APPROACH Table 1. Fields in GSDTSR It is advantageous to apply continuous integration system for frequent system builds and regression testing technique to test new and changes version of codes to make sure that the integration can be operated early, errors can be detected sooner and feedbacks on potential problems can be achieved faster. Presubmit phase of testing helps to reduce the number of possible

4 problems accessing to the post-submit testing phase to fail the builds. However the process still faces a lot of other challenges. For example, as the project grows, testing still need to deal with a large amount of dependent code and modules, which relate to large number of test suites. In this paper, we are focusing on making pre-submit testing phase cost effective by applying decay mechanism. In this section, I will first introduce the decay model I use. Then I will provide the algorithm I implemented in the Google dataset and explain it in details. 3.1 Using Decay Mechanism to Find Frequent Failed Test Suites Under CI development environment, it is quite important to detect as many failed test suites as possible, so that developers could fix those problems prior submission to reduce the potential build break. In order to make pre-submit testing phase cost effective, we apply regression testing selection (RTC) technique to test them. However, the traditional RTC is not applicable: Google s codebase undergoes a large number of changes per minute [14]; Traditional techniques analysis relies on code instrumentation and is applicable only to discrete and complete set of test suites [5]. However, in CI, testing requests arrive at frequent intervals, which will cause the analysis very expensive. And the frequent coming test suites make it hard to apply to the discrete and complete set of test suites. We therefore provide a new approach to regression select a subset of test suites. In CI, when testing, large number of test suites will arrive at a very frequent interval. Consequently, the knowledge embedded in the data stream is more likely to be changed as time goes by and the more recent change of a data stream could provide more valuable information for the analysis of the incoming test suites selection. For example, for a certain test suite A, even though A rarely failed previously, if it fails frequently recently, it may indicate some problems in the recent changed code. On the other hand, even test suite B frequently failed previously, if it hasn t failed in the recent code changes, it may indicate the good function of related changed code. From the above example, we can see that as time goes by, the effect of old fail/pass status of the test suites gradually decreases and the effect of recent fail/pass status of the test suites gradually increases. To find the effect variation of the frequent test suites, we utilize the decay mechanism. (1) Let T = {t1, t2,, tn} be a set of test suites to be executed. Each of the test suites can be executed over and over again if it is selected. (2) Let T be the selected set of test suites such that T 𝜖 {2n { }} where 2n is the power set of n (each of the test suites ti in T has two possibility: selected or not selected). (3) A transaction is a subset of T and each transaction has a unique transaction identifier T_id. A transaction generated at the kth turn is denoted by Tk. (4) When a new transaction Tk is generated, the current data stream Dk is composed of all transactions that have ever been generated so far i.e., Dk = <T1, T2,, Tk>. and the total number of transactions in Dk is denoted by Dk. (5) When a transaction Tk is generated, the current count Ck(Ti) of a subset Ti is the number of transactions that contain the subset Ti among k transactions. Decay rate: decay rate is the reducing rate of a weight for a fixed decay-unit. A decay-unit determines the chunk of information to be decayed together. [6] A decay rate is defined by 2 factors: (1) A decay-base b. b determines the amount of weight reduction per decay-unit and b>1. (2) A decay-base-life h. When the weight of the current information is set to 1, h is defined by the number of decay-units that makes the current weight be b-1. Predefined minimum support (transaction count) 𝛅. not all of test suites that come are significant for finding potential faults. A test suite that has much less support than a predefined minimum support is not necessarily monitored since it cannot detect failures in the near future. When the estimated support of a coming test suite is large enough, it is regarded as a significant test suite and it will be selected for execution. Otherwise, the test suite will be skipped. Then the decay-rate can be defined as follows: d = b-(1/h) (b>1, h 1, b-1 d 1) Then, given a decay rate d = b-(1/h), (b>1, h 1, b-1 d 1), the total number of transactions Dk in the current data stream Dk is found as follows: 𝐷! = 1 𝑖𝑓 𝑘 = 1 𝐷!!! 𝑑 + 1 𝑖𝑓 𝑘 2 When the first transaction is generated, the number of transaction D1 is obviously 1 (since there exists no previous transaction whose weight should be decayed). As the next transaction comes, the total number of transaction D2 = D1 d + 1 (since the first transaction D1 should be decayed by the decay rate). As the new transaction is generated at kth (k 2) turn, the total number of transactions is as follows: D = 𝐷!!! 𝑑 + 1 = ( Dk-2 𝑑 + 1) 𝑑 +1 = (( 𝐷!!! 𝑑 + 1) 𝑑 + 1) 𝑑 +1 =.. = ((((1) 𝑑 + 1) ) 𝑑 + 1) + 1 = dk-1 + dk d + 1 = (1 𝑑 k)/(1 𝑑) Because of b-1 d 1, Dk converges to 1/(1-d) as k increases infinitely. Then, the count Ck(Ti) of the subset Ti(selected tests set) in the current data stream Dk is achieved as follows: Ck(Ti) = Ck-1(Ti) d + W(Ti), W(Ti) = 1 𝑖𝑓 𝑇! 𝑇! 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 3.2 Algorithm of Decay Mechanism improved Regression Test Selection The algorithm utilizes the previous decay model to regression select the test suites with more potential influences on the current result.

5 Algorithm: Decay-SelectionPreTests //select test suites C 1 (T i ) = 1 T k = φ for all T i T do if C k (T i ) >= δ or T i is new then T k ß T k T i //execute it // update transaction count if T i is failed then C k (T i ) = C k-1 (T i ) d +1 else C k (T i ) = C k-1 (T i ) d + end if end if end for Return T This algorithm contains 2 main parts. The first part is to select the test suites for execution. The second part is to update the transaction count. When initialized, for each T i, count of transaction C 1 (T i ) = 1, k = 1 and T k = φ. Then as each test suite T i comes, we will check: (1) Whether the test suite is new. If yes, then we will execute it and add the test suite to T k and update the transaction count. (2) Whether the test suite s C k (T i ) is larger than an independent variable δ, which will determine the whether the test suite will be executed. If yes, then we will execute it and add the test suite to T k and update the transaction count. After the determination of the test suite execution, we will update the transaction count according to the result of the test execution. Since we conjecture that test suites with failed records is more likely to indicate the code churn, so if the test execution result is failed, then the current transaction count will increase by one plus the previous decayed transaction count. On the other hand, if the test execution result is passed, then the current transaction count will be equal to the decayed previous transaction. The formula is as follows: (1) If the test suite is failed, then T! T!, so we update it as C k (T i ) = C k-1 (T i ) d +1 (2) If the test suite is passed, then T! T!, so we update it as C k (T i ) = C k-1 (T i ) d + Take T i as an example. Suppose b=2, h=2, then d= b -(1/h) = 2 (-/2) and suppose δ=.7 and T i is not new. b h d Suppose C k-1 (T i ) = 1, since 1> δ, the T i should be executed, and if the execution result is failed, then the C k (T i ) will be updated to 1 d +1= However, if the execution result is passed, then C k (T i ) will be updated to 1 d += EMPIRICAL STUDY In order to evaluate the cost effectiveness of my new algorithm, I set several dependent and independent variables and simulate the progress of Google pre-submit testing phase to select and execute test suites. RQ1: How cost effective is the RTS based on Decay mechanism during pre-submit testing? And how dose the cost effectiveness vary with different settings of decay rate d and δ? RQ2: How dose the cost effectiveness performance of Decay based RTS compared with the previous time window based RTS? In this section I will first provide a general introduction to the object of my study, GSDTSR. In section 4.2, I will present the independent and dependent variables for this study. In section 4.3, I will discuss the operation of my study. And in the last section, I will consider the potential external validities and internal validities of this technique. 4.1 Object of Analysis For the object of this experiment, I will use Google Shared Dataset of Test Suite Result (GSDTSR), which contains a sample of 3.5 Million test suite execution results from a fast and large scale of continuous testing infrastructure [15]. 4.2 Variables and Measures Independent Variables In my empirical study process, the independent variables involve the techniques, decay rate and predefined minimum support. Technique: the decay mechanism based RTS presented in Section 3.2. Decay rate: as presented before, the decay rate d = b -(1/h) (b>1, h 1, b -1 d 1). I choose a fixed decay-base b = 2, and 3 decay-base-life h = {6, 1, 5} representing different numbers of decay-units that makes the current weight be b -1. Predefined minimum support: as presented before, the predefined minimum support is used to determine whether the test suite contains enough possibility to detect failures in the future. I choose 11 predefined minimum supports δ = {.5, 1, 2, 4, 8, 12, 16, 2, 24, 28, 32} representing different transaction count Dependent Variables As for dependent variables, I measure the percentage of test suites that are selected, the percentage of execution time required, and the percentage of failures detected by the technique. I do this for each combination of h and δ. 4.3 Study Operation In order to evaluate the result of my proposed technique, I implemented the algorithm described in Section 3.2. For the object, I use GSDTSR dataset to simulate a continuous testing environment. Decay-SelectionPreTests implementation utilizes the GSDTSR data, a decay rate d, and a predefined minimum support δ to select the test suites with higher possibility to detect failures. As for the report of the result, it contains three parts: the number of test suites selected, total execution time of the test suites selected and the number of failures selected. For each of the coming test suite, the program will use the proposed technique to check whether the certain test suite should be selected. If the test suite should be executed, then number of selected test suites and the total execution time of the selected test suites will be updated. If the result of the test suite is failed, the number of failure detected will be updated.

6 4.4 Thread to validity External validity. Even I have applied my implementation to the Google dataset that contains 3.5 million records of test results, the dataset still only represent a small section of one industrial setting. I have utilized several combinations of decay rates and predefined minimum supports to evaluate my result. However I haven t considered factors related to the availability of computing infrastructure, such as the different number of platforms available for use in testing Internal validity. Since I am using my own tool to do the experiment study, if there exists any faults in my tool, then the result won t be persuasive. In order to avoid this scenario, I carefully unit tested my tool on small portions of the dataset. In addition, for the 3.5 million test results, I did not consider the possible flaky test suites, which may have different testing results for different runs. 4.5 Results and Analysis In this section, I will analyze the results of my study according to the 3 research questions RQ1: How cost effective is the RTS based on Decay mechanism during pre-submit testing? Results of the implementation are shown in the following Figure 3, Figure 4, and Figure5. Each of those figures shows: for each δ (on the x-axis), the percentage of test suites selected, the execution time of those test suites and the percentage of failures detected. To be noted, the percentage of failures detected are corresponding to the total number of failures detected when all test suites are selected. Through those three figures, we can find that, the number of failing test suites increases as δ decreases, and the percentage of failing test suites can reach 81.39% with executing only 19.24% test suites. And as the h in decay rate increases, the number of failing test suites increases. Take δ = 32 as an example, when h=6, the percentage of failing test suites detected is 41.82%; When h = 1, the percentage of failing test suites detected is 45.7%; When h = 5, the percentage of failing test suites detected is 52.36%. From the three figures, we can see that, percentage of test suite selected to execute is very low: from.2% ~ 19.24% (except the one with 79.7% test suite selection, whose δ is.5 and h is 5). As expected, using smaller δ will lead to more aggressive test suites selection. For example, for all the 3 figures, the percentage of failure detections when δ =.5 are 35% more than when δ = 32. The reason is that as the predefined minimum support δ increases, each test suite T i need to have higher C k (T i ) to be selected. However, for each T i, its previous transaction count C k (T i ) is related to the number of previous failed test suites, which is fixed. So the higher δ will cause the less selection of test suites, leading to the less detection of failing test suites. As decay-base-life h increases, the number of previous test suites that affect the transaction count C k (T i ) will increase. And as the number of previous test suites that affects transaction count C k (T i ) increases, the number of failing test suites among them are also increasing, so that C k (T i ) will increase, leading to the higher percentage of failure detection Figure 3. Test Suite Selection: h= Figure 4. Test Suite Selection: h= Figure 5. Test Suite Selection: h=5 %TestSuite %Execution Time %FailDetect %TestSuite %Execution Time %FailDetect %TestSuite %Execution Time %FailDetect RQ1: How dose the cost effectiveness performance of Decay based RTS compared with the previous time window based RTS? Figure 6 and Figure 7 are the failure detection trends against test suites selection. Each of the figures shows, for each percentage of test suites selection (on the x-axis), the corresponding percentage of failure detection is on the y-axis.

7 Figure 6 are the result for window-based test suite selection technique proposed by S. Elbaum et al [5]. Both of the two figures show, as the percentage of test suites selection increases, the percentage of failure detection also increases. This because as the number of test suites selected increase, more percentage of failing test suites are detected. By comparing Figure 6 with Figure 7, we can find that all the percentages of failure detections are higher than 4%. However, the both We = 1 and We = 48 (We is execution window, representing hours) in Window Based RTS contains several failure detections that are lower than 4%. In addition, in Decay Based RTS, most of the points are selecting a very low percentage (less than 3%) of test suites, but get more than 4% of failure detections. However, in Figure 6, when the execution window is 1 hour (We=1), even though all of the failure detections are higher than 4%, the percentage of test suites selection is also very high, which is more than 2%. By this comparison, we can find that, Decay Based RTC is more cost effective than Window Based RTC DISCUSSION Figure 6. Window Based RTS Figure 7. Decay Based RTS We=1 We=24 We=48 h=6 h=1 h=5 RTS technique is not only influenced by decay rate as I proposed in this paper, it also should be influenced by many environmental factors, like the resources available for test execution (number of machines on which to run test suites), the rate at which change lists arrive, and the expense of executing test suites. In this section, I will discuss some additional issues that are not considered in this paper. Considering test suites coming rate. In this paper, I did not consider the coming rate of the test suites. For example, people are likely to submit more commits during weekdays but less during weekends. So the number of test suites executed on weekdays is more than it is on the weekends. For decay mechanism, as time goes by, the effect of the failure records on future testing results is decreasing, which means weekends will also affect the determination of test suite selection. For example, test suite t1 has a very high transaction count on Friday, and it should be selected for execution. However, as weekend goes by, the transaction count of t1 will be decreased on Monday even t1 is not executed or passed during weekend. Running on paralleled machines. In this paper, my program walk through each test suite to simulate the process of selection. However, I did not consider the number of machines available. I suppose all the test suites are coming and waiting in a queue. In real industry environment, we need to manage those continuous waiting test suites according to the resource availability to maximize the testing throughput while achieve maximum execution time. Some factors selection. The result relies on the choice of decay rate b and predefined minimum support δ. This paper chose several numbers for h and δ to run out the result. However, if the numbers vary, the result may also vary. In the future work, I will test more independent variables to analyze the relation between each factor for further analysis. 6. CONCLUSION As the continuous integrity system is increasingly adopted by large organizations for their fast release of new products, it is necessary to improve the overall build efficiency. As an important role of build part, regression testing also should be conducted cost effectively. This paper proposed a decay mechanism based regression testing selection technique for pre-submit testing phase. It adopt decay rate to calculate each test suites transaction count according to its failure records. If the transaction count is high enough, then the test suite will be selected for execution. By the empirical study operated on Google Shared Dataset of Test Suite Result (GSDTSR), the result shows that this technique could select a very low percentage of test suites to detect a quite high percentage of test suites. For the future work: (1) I will choose more independent variables to observe the trends for further improving the technique. (2) I will apply some queuing models (M/M/1, M/M/k, etc.) to compare the different combinations of queues and different number of server machines to support parallel execution scheduling. (3) I will improve the technique so that it can dynamic adjust the execution scheduling according to the rate of coming test suite. 7. ACKNOWLEDGMENTS My advisors, who help me understand more about the previous work, have supported this work. And when I was preparing for this implementation, they gave me a lot of good ideas. Thank them very much.

8 8. REFERENCES [1] P. M. Duvall, S. Matyas, and A. Glover. Continuous Integration: Improving Software Quality and Reducing Risk. Pearson Education, 27. [2] Atlassian. Atlassian software systems: Bamboo [3] Jenkins. Jenkins: An extendable open source continuous integration server. jenkins-ci.org, 214. [4] ThoughtWorks. Go: Continous delivery [5] S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, pages , 214. [6] J. H. Chang, and W. S. Lee. Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ,. [7] G. S. Manku, and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 22. [8] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Automata, Languages and Programming. Springer Berlin Heidelberg, [9] C. H. Lee, C. R. Lin, and M. S. Chen. Sliding-window filtering: an efficient algorithm for incremental mining. Proceedings of the tenth international conference on Information and knowledge management. ACM, 21. [1] G. Rothermel, and M. J. Mary. A safe, efficient regression test selection technique. ACM Transactions on Software Engineering and Methodology (TOSEM) 6.2, [11] S. Bucur, V. Ureche, C. Zamfir, and G. Candea. Parallel symbolic execution for automated real-world software testing. In Proceedings of the Sixth Conference on Computer Systems, [12] Y. Kim, M. Kim, and G. Rothermel. A scalable distributed concolic testing approach: An empirical evaluation. In Proceedings of the International Conference on Software Testing, Apr [13] M. Staats, P. Loyola, and G. Rothermel. Oracle-centric test suites prioritization. In Proceedings of the International Symposium on Software Reliability Engineering, Nov [14] P. Gupta, M. Ivey, and J. Penix. Testing at the speed and scale of google. the-google-test-and-development_21.html, 214. [15] S. Elbaum, J. Penix, and A. McLaughlin. Google shared dataset of test suite results. -shared-dataset-of-testsuite-results/, 214. [16] L. Zhang, S.-S. Hou, C. Guo, T. Xie, and H. Mei. Timeaware test-case prioritization using integer linear programming. In Proceedings of the International Symposium on Software Testing and Analysis, [17] A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 24. [18] Y. Shin, R. Nilsson, and M. Harman. Faster fault finding at Google using multi objective regression test optimization. 8th European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 11), Szeged, Hungary. 211.

9