Toward Quantitative Process Management With Exploratory Data Analysis

Toward Quantitative Process Management With Exploratory Data Analysis Mark C. Paulk Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Abstract The Capability Maturity Model for Software is a model for building organizational capability that has been widely adopted in the software community and beyond. The Software CMM is a five-level model that prescribes process improvement priorities for software organizations. Level 4 in the CMM focuses on using quantitative techniques, particularly statistical techniques, for controlling the software process. In statistical process control terms, this means eliminating assignable (or special) causes of variation. Organizations beginning to use quantitative management typically begin by "informally stabilizing" their process. This paper describes typical questions and issues associated with the exploratory data analysis involved in initiating quantitative process management. Introduction The Capability Maturity Model (CMM) for Software [Paulk95], developed by the Software Engineering Institute (SEI) at Carnegie Mellon University, is a model for building organizational capability that has been widely adopted in the software community and beyond. The Software CMM is a five-level model that describes good engineering and management practices and prescribes improvement priorities for software organizations. The five maturity levels are summarized in Figure 1. The higher maturity levels in the CMM are based on applying quantitative techniques, particularly statistical techniques [Florac99], to controlling and improving the software process. In statistical process control (SPC) terms, level 4 focuses on removing assignable causes of variation, and level 5 focuses on systematically addressing common causes of variation. This gives the organization the ability to understand the past, control the present, and predict the future quantitatively. Regardless of the specific tools used (and control charts are implied by SPC), the foundation of levels 4 and 5 is statistical thinking [Hare95], which is based on three fundamental axioms: all work is a series of interconnected processes all processes are variable Capability Maturity Model and CMM are registered with the U.S. Patent and Trademark Office. SM Personal Software Process and PSP are service marks of Carnegie Mellon University. The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense. 1

understanding variation is the basis for management by fact and systematic improvement The statistical thinking characteristic of a high maturity organization depends on two fundamental principles. First, process data is collected at the process step level for realtime process control. This is perhaps the most important single attribute of a level 4 organization that engineers are using data to drive technical decision making in realtime, thereby maximizing efficiency. Second, and a direct consequence of statistical thinking, is that decision making at the process level incorporates an understanding of variation. A wide range of analytic techniques can be used for systematically understanding variation, ranging from simple graphs, such as histograms and bar charts, and statistical formulas, such as standard deviation, to statistical process control tools, such as XmR charts, u-charts and beyond. The simplicity of a histogram does not lessen its power a simple picture that imparts insight is more powerful than a sophisticated formula whose implications are not understood. Level Focus Key Process Areas Continual process improvement 5 Optimizing 4 Managed 3 Defined 2 Repeatable 1 Initial Product and process quality Engineering processes and organizational support Project management processes Competent people and heroics Figure 1. An overview of the Software CMM. Defect Prevention Technology Change Management Process Change Management Quantitative Process Management Software Quality Management Organization Process Focus Organization Process Definition Training Program Integrated Software Management Software Product Engineering Intergroup Coordination Peer Reviews Requirements Management Software Project Planning Software Project Tracking & Oversight Software Subcontract Management Software Quality Assurance Software Configuration Management Although the Software CMM has been extensively used to guide software process improvement, the majority of software organizations are at the lower maturity levels; as of March 1999, of the 807 organizations active in the SEI's assessment database, only 35 were at levels 4 and 5. While the number of high maturity organizations is growing rapidly, it takes time to institutionalize a measurement program and the quantitative management practices that take good advantage of its capabilities. The typical software organization takes over two years to move from level 1 to level 2 and from level 2 to level 2

3 [Herbsleb97]. One to two years is a reasonable expectation for building, deploying, and refining quantitatively managed processes. One of the challenges in moving to level 4 is the discovery organizations typically make when looking at their process data: the defined processes used by the projects are not as consistently implemented or measured as believed. When a process is being placed under statistical process control in a rigorous sense, it is "stabilized" by removing assignable causes of variation. "Informal stabilization" occurs simply by examining the data (graphically) before even placing it on a control chart, as patterns in the data suggestive of mixing and stratification are seen. If there is a great deal of variability in the data, a common complaint when arguing that SPC cannot be applied to the software process [Ould96], the control limits on a control chart will be wide. High variability has consequences: if the limits are wide, predictability is poor, and highly variable performance is to be expected for future performance. If highly variable performance is unacceptable, then the process will have to be changed. Ignoring reality will not change it. Since some studies suggest a 20:1 difference in the performance of programmers, variability is a fact of life in a design-intensive, humancentric process. The impact of a disciplined process can be significant in minimizing variation while improving both quality and productivity, as demonstrated by the Personal Software Process SM [Humphrey95, Hayes97]. Some software organizations are using control charts appropriately and to provide business value [Paulk99a, Paulk99b], thus there are a few examples of SPC for software providing business value. Informally stabilizing the process can be characterized as an exercise in exploratory data analysis, which is a precursor to the true quantitative management of level 4. The processes that are first stabilized tend to be design, code, and test, since there is usually an adequate amount of inspection and test data to apply statistical techniques in a fairly straightforward manner. A fairly typical subset of code inspection data in Table 1 illustrates what an organization might start with. The organization that provided this data was piloting the use of control charts on a maintenance project. Table 1. Representative Code Inspection Data From an Organization Beginning Quantitative Management. Number of Inspectors Inspection Preparation Time Code Inspection Time (number of inspectors X inspection hours) Number of Defects Lines of Code 7 7.8 13.5 2 54.3 6 8.1 9.5 2 87.4 5 4.4 1.3 0 60.1 5 2.8 2.5 1 1320.6 6 18.0 0.9 2 116.2 6 4.7 1.5 2 46.6 6 3.0 3.0 0 301.6 5 2.6 2.5 3 62.0 3

If you were asked to analyze the data in Table 1, what questions might you ask? They will probably fall in four broad categories: operational definitions, process consistency, aggregation, and organizational implications. Operational Definitions Good operational definitions must satisfy two important criteria [Florac99]: communication. If someone uses the definition as a basis for measuring or describing a measurement result, will others know precisely what has been measured, how it was measured, and what has been included and excluded? repeatability. Could others, armed with the definition, repeat the measurements and get essentially the same results? In looking at the data in Table 1, the first question is likely to be, "How is a line of code defined?" The fact that the LOC values are not integers rings a bell, yet, when first hearing that this is a maintenance project, the question should have arisen, "How do they deal with modified, deleted, and unchanged lines?" In this case, a formula was used to create an aggregate size measure that is a weighted function of new, modified, deleted, and unchanged lines. It is more important to know that the formula exists and is being consistently used than to know what the specific formula is. The second question might be, "What are the time units?" One panelist at the 1999 European SEPG Conference reported a case where the unit was not clearly communicated, and they discovered that their time data included both hours and minutes (assuming every value of 5 or less was hours was their pragmatic solution). The third question is obviously, "How is a defect defined?" While the first two metrics can be collected in a fairly objective fashion, getting a good operational definition of "defect" can be challenging. Are multiple severity levels included, from life-critical to major to trivial? Are trivial defects even recorded? How does the inspection team determine what category a defect belongs in? Again, the crucial question is whether the data can be collected consistently and repeatably. A fourth question should be, "Is the data collected at the same point in the process each time?" For example, are code inspections performed before or after a clean compile is obtained? If this is not specified, there may be a mix of compiled/not compiled inspection data, which will increase variability significantly. Process Consistency Even if the data is collected consistently and repeatably, the process itself maybe vary from one execution to the next. For example, some teams may engage in "pre-reviews" before inspections (to ensure that the inspected code is of acceptable quality not an 4

unreasonable practice if the number of defects reported in the inspections has ever been used in a performance appraisal). This, too, can lead to a mix of pre-reviewed/not prereviewed data that will increase variability. Another panelist at the 1999 European SEPG Conference identified a case where examination of data revealed two operationally distinct inspection processes. The distinguishing attribute of the inspection was the size of the work product. If the code module being inspected was larger than about 50 lines of code, the inspection rates were significantly different even though the same inspection process was supposedly being performed. The important insight is not whether the existence of two operationally different inspections is appropriate, but that the decision be a conscious one. In the case of the data in Table 1, two questions immediately arise: "Is it a good idea to have inspections covering this wide a range of code sizes?" and the corollary question, "Are the inspection rates for these different sizes of module reasonable?" Different organizations may establish somewhat different guidelines, but150 LOC per hour is a reasonable target [Fagan86]. A casual examination of the data provided in Table 1 suggests that some inspection rates are running at greater than 2,000 LOC per hour, which suggests a significant process consistency issue. Aggregation When analyzing process data, there are many potential sources of variation in the process. It is easy to overlook sources of variation when data are aggregated. Common causes of overly aggregated data include [Florac99]: poor operational definitions inadequate contextual information lack of traceability from data back to its original context working with data whose elements are combinations (mixtures) of values from different sources The predominant source of aggregated data is simply that different work products are produced by different members of the project team. Collecting data on an individual basis would address this, but could have severe consequences in terms of motivational use of the data, e.g., during performance appraisals, which can lead to dysfunctional behavior [Austin96], and in terms of the amount of the data available for statistical analyses. There are no easy answers to this question. It is, however, possible on occasion to disaggregate data. For example, defect data could be separated into different categories, and control charts on each category may provide significantly better insight into separate common cause systems [Florac99]. Organizational Implications In the particular example we have gone through above, the data was used within a single 5

project. When dealing with organizational data, these problems are exacerbated. In moving between projects, application domains, and customers, operational definitions may be "adjusted" to suit the unique needs of the new environment, thus it is crucial to understand the context of the data when doing cross-project comparisons. It can be particularly challenging when government regulations or customers demand that data be reported in different ways than the organization would normally collect it. Conclusion This paper provides a simple road map through some of the issues that an analyst must deal with in implementing quantitative process management. As we frequently say about the CMM, this is not rocket science, but it is easy to miss an important point, and it can be quite frustrating at times to work through these issues. These are, however, typical problems that most organizations have work through on the journey of continual process improvement; "informal stabilization" seems to be a necessary precursor to the useful application of rigorous SPC techniques. References Austin96 Fagan86 Florac99 Hare95 Hayes97 Herbsleb97 Robert D. Austin, Measuring and Managing Performance in Organizations, Dorset House Publishing, ISBN: 0-932633-36-6, New York, NY, 1996. M.E. Fagan, "Advances in Software Inspections," IEEE Transactions on Software Engineering, Vol. 12, No. 7, July 1986, pp. 744-751, reprinted in Software Engineering Project Management, R.H. Thayer (ed), IEEE Computer Society Press, IEEE Catalog No. EH0263-4, 1988, pp. 416-423. William A. Florac and Anita D. Carleton, Measuring the Software Process: Statistical Process Control for Software Process Improvement, ISBN 0-201- 60444-2, Addison-Wesley, Reading, MA, 1999. Lynne B. Hare, Roger W. Hoerl, John D. Hromi, and Ronald D. Snee, "The Role of Statistical Thinking in Management," ASQC Quality Progress, Vol. 28, No. 2, February 1995, pp. 53-60. Will Hayes and James W. Over, "The Personal Software Process (PSP): An Empirical Study of the Impact of PSP on Individual Engineers," Software Engineering Institute, Carnegie Mellon University, CMU/SEI-97- TR-001, December 1997. James Herbsleb, David Zubrow, Dennis Goldenson, Will Hayes, and Mark Paulk, "Software Quality and the Capability Maturity Model, Communications of the ACM, Vol. 40, No. 6, June 1997, pp. 30-40. 6

Humphrey95 Watts S. Humphrey, A Discipline for Software Engineering, ISBN 0-201- 54610-8, Addison-Wesley Publishing Company, Reading, MA, 1995. Ould96 Paulk95 Paulk99a Paulk99b Martyn A. Ould, "CMM and ISO 9001," Software Process: Improvement and Practice, Vol. 2, Issue 4, December 1996, pp.281-289. Carnegie Mellon University, Software Engineering Institute (Principal Contributors and Editors: Mark C. Paulk, Charles V. Weber, Bill Curtis, and Mary Beth Chrissis), The Capability Maturity Model: Guidelines for Improving the Software Process, ISBN 0-201-54664-7, Addison- Wesley Publishing Company, Reading, MA, 1995. Mark C. Paulk, Practices of High Maturity Organizations, The 11 th Software Engineering Process Group (SEPG) Conference, Atlanta, Georgia, 8-11 March 1999. Mark C. Paulk, "Using the Software CMM With Good Judgment, ASQ Software Quality Professional, Vol. 1, No. 3, June 1999, pp. 19-29. 7