Ensure Merge Accuracy in Continuous Integration Development Environments
2 Defect Challenges in Continuous Integration Development Modern software development is rapidly moving to incremental development undertaken in a collaborative team model. And while this is the pervasive trend, it presents a significant new challenge to software quality. As organizations achieve more rapid software development cycles using new tools to streamline the process, ensuring merge accuracy has become an increasingly difficult issue to manage. The Origin of Merge Accuracy Problems Here s how it looks on the ground: Developers fix bugs and develop features in a side development branch. Then, these fixes and features are merged into the main trunk for release. Release managers must determine two critical things before the release ships: > Is the release missing any bug fixes or planned features? > Are all bug fixes and features correctly merged into the release? Answering these questions quickly, under pressure and with certainty is a daunting task. And it is common for release teams to spend a great deal of manual effort and time getting to verifiable answers. How Things Go Terribly Wrong We all know that bug fixes and features fail to make releases due to human error. In particular, when a process involves tens to hundreds of developers and hundreds to thousands of bug fixes and features, the chances of this failure happening increase significantly. Further, it s also common for a bug fix or feature to be broken in the merge process. This occurs because a merge conflict is resolved in the wrong way: An integration engineer may not understand the full context of the code when resolving a merge conflict. Or, a resolution suggested by the text-based merging tool is accepted without due rigor on the part of the developer. It can even occur when an engineer accidentally reverts part of someone else s change. This is made even more problematic by a common practice in today s large-scale software development: > Parallel branching. This is the predominant strategy employed to streamline software development and support release management needs. For example, it is very common for software companies to maintain many old releases in the field even while working on a new release. In some organizations, it is also necessary to maintain multiple customized versions, each specialized for a specific customer. Developers continuously make code changes, including bug fixes, in these versions. And it is very important to ensure these changes are integrated into the new release. SCM Systems Can t Determine Merge Accuracy Software configuration management (SCM) systems are designed to track and control changes in source code and are an important component in continuous integration development. Software companies make well-intentioned attempts to ensure correct changes are properly ported into releases. Often, the SCM solution is a key component in that attempt. However, the problem of ensuring merge accuracy is bigger than many teams think. The result is that changesets in development branches often end up missing in production releases. Here are the common causes of this problem:
3 1. SCM Limitations > SCM systems rely on manual input. Developers must take the right action during merge or key in the right information. But developers don t always use the integration command to merge code. Instead, the changesets are managed manually. And that leaves no trace in the SCM log. > SCM systems don t understand code and treat it as text. For example, SCM systems cannot track duplicated content. And if a file is removed and added back in, SCM systems cannot track the changes between these two files. Additionally, if a change has been made to one branch, even if the change is superficial (e.g. a variable rename), the SCM system will fail to perform the integration and the developer will have to decide manually what action to take. > SCM systems don t understand bugs and features. Although tedious, it may be possible to determine which changeset is missing in a branch, but a bug fix or feature usually contains multiple changesets. The SCM system cannot tell you if a bug fix or feature has been merged correctly knowing the status of each individual changeset does not solve this larger problem. The results must be aggregated to be meaningful. > Legacy SCM systems cannot perform even basic changeset tracking. Regularly used in development shops, these earlier-generation SCM systems don t support the notion of a changeset (or changelist). They were designed to do one thing, track revisions in files. Naturally, they fail in their attempts to track changesets. This means the developer is unable to determine what the complete change is that needs to be applied especially if that change spans multiple segments of code in multiple files. Manual workaround rodeos become the methodology that teams saddled with these older systems are forced to use. 2. Regular-Expression-Based Tool Another method often attempted is keyword search or regular expression-based tools. These are truly lacking in their scope of usefulness as well. Here s why: > Regular expressions to capture a bug fix or feature across multiple segments of code are virtually impossible to write. It is extremely difficult to determine the right query to get the results you want. A bug fix tends to involve many snippets in different files. A simple keyword search or regex-based search cannot capture this complexity. > The results are usually very noisy, since the search lacks context. Search results contain extraordinary volumes of irrelevant results. The overhead of processing them and narrowing down the candidate code snippets is prohibitive. > It is impossible to be certain that the results are complete. For instance, cases including examples where buggy code is modified are consistently missed. 3. SCM System Migration Increases Defect Issues Modern SCM systems have evolved to accommodate continuous integration development. GIT s fast branching and merging, for example, makes it an appealing SCM system for Agile environments. While it s common for a company to be using one SCM system, such as CVS, for an old product, teams are frequently migrating to GIT or Perforce for a newer product line. These migrations tend to be lengthy, and development groups must manually inspect all code to assure every change made in one system is also made in the other there is no cross-scm checking mechanism available. The opportunity for changes to be missed, or incorrect changes to be merged into a production release increases significantly in this situation.
4 What Is the Solution? An effective solution to the problem of ensuring merge accuracy must: > Parse the code: Unless a tool can parse and understand the code, it s impossible to identify semantic patterns in code, tolerate superficial changes and make structural and semantic matches. > Not rely on metadata: Maintaining code quality in continuous integration development environments necessitates viewing the actual code itself versus relying on metadata. > Be automated: The only way for a solution to scale is to be fully automatable. I merged new code, now give me a report, or, take all the changesets and tell me if they are the proper ones for the new release. Without this capability, wide adoption simply will not happen. Pattern Insight s Code Insight Code Insight from Pattern Insight is the only software solution that ensures your continuous integration merges are accurate. It is based on patent-pending fuzzy matching technology, which can tolerate any variable name change or statement insertion and deletion. Figure 1: Ensure that all changes are ported into final release. Code Insight takes a list of branches containing bug fixes and features, analyzes the code changes associated with them, and checks against the target branch to see if any change is missing or broken, which is all done in an automated fashion. It can be conducted during the release time, but more commonly, it is done in a continuous integration process, where Code Insight periodically generates an incremental report listing any missing or broken changes in a target branch. Code Insight is: > Fast: Returns results in seconds, even for billions of lines of code > Accurate: Extremely low false-positive rates > Easy-to-use: Fully integrates with all SCM systems and is capable of being used in any development, build and release process
5 Further, it: > Finds all similar code patterns within a single version or a branch of a code base. > Compares two versions or branches of a single codebase, and identifies all code segments that are similar or identical between them. > Compares two different code bases and identifies all code segments that are similar or identical between them. Code Insight can be used in other workflow scenarios, as well: Ensure releases are clean. For Build/Release Owners, Code Insight easily integrates into the release process or continuous integration to identify previously fixed bugs in releases going out the door. For example, one of our customers in the mobile device industry built a catalog of thousands of security and other Priority-1 bugs and runs nightly reports to determine if any of these have leaked into its hundreds of builds. If a match is found, the build is blocked and the developer is notified automatically. Figure 2: Ensure releases are clean. Eliminate all instances of bugs in development. Developers want to ensure that a bug has been fixed across all locations, branches and components in the development process. In this catch point use, the developer uses Pattern Insight s Code Insight to quickly run a report to find where every instance of a bug is and fixes them immediately. More automatically, the developer can be informed in code review or code check-in. Figure 3: Eliminate all instances of bugs in development.
6 Conclusion: Gain Insight into the Code Itself to Ensure All Changes are Correctly Ported into Final Release Release engineers deploy Pattern Insight to automatically ensure that all the changes made in separate branches make it into the final release. Hundreds or thousands of changes can be checked in one single run. And this process only takes a matter of minutes compared to manual verification, which may take months. Pattern Insight is the only automated means to pinpoint missing changesets in your source code. It is essential to gain insight into the source code itself versus relying on metadata. You need an accurate assessment versus best guesses as to the quality of your software. The manual processes that engineering departments rely on are risky and inadequate in light of the explosive growth in artifacts generated in the shorter development cycles. And as continuous integration development becomes more popular, newer SCM systems, designed around this methodology present additional challenges. These factors have caused major software companies, including Cisco, Qualcomm, Motorola, Wind River and EMC, to adopt Pattern Insight. To learn more about Pattern Insight s solutions, go to PatternInsight.com.
Pattern Insight Headquarters 465 Fairchild Drive, Suite 209 Mountain View, CA 94043 P: 866 582 2655 F: 408 573 7855 E: info@patterninsight.com PatternInsight.com Copyright 2012 Pattern Insight Inc. All Rights Reserved.